I am doing a beginner level course on Khan Academy for SQL.
CREATE TABLE exercise_logs
(id INTEGER PRIMARY KEY AUTOINCREMENT,
type TEXT,
minutes INTEGER,
calories INTEGER,
heart_rate INTEGER);
.....Inserted data into the table....
SELECT type, SUM(calories) AS total_calories FROM exercise_logs
GROUP BY type
HAVING total_calories > 150;
SELECT type, AVG(calories) AS avg_calories FROM exercise_logs
GROUP BY type
HAVING avg_calories > 70;
SELECT type FROM exercise_logs GROUP BY type HAVING COUNT(*) >= 2;
I understood the Having clause in the first two ‘Select’ queries. It clearly sums up or takes the average of the calories of each exercise type grouped.
In the third select clause, I did not however understand how after ‘grouping by type’ SQL also registers a count of ‘how many’ have been grouped.
Does it by default register how many rows have been grouped?
The idea that the sum of calories or its average can be queried after grouping seems intuitive, since the calories is an integer attribute.
But there is no attribute that registers the count.
Maybe we can think of the primary key as a unique identifier, and that SQL taps into this knowledge to identify how many such unique identifiers have been grouped together to understand the ‘count’.
But I am looking for a concrete answer.
Thanks for the help in advance.
CREATE TABLE exercise_logs
(id INTEGER PRIMARY KEY AUTOINCREMENT,
type TEXT,
minutes INTEGER,
calories INTEGER,
heart_rate INTEGER);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("biking", 30, 115, 110);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("biking", 10, 45, 105);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("dancing", 15, 200, 120);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("dancing", 15, 165, 120);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("tree climbing", 30, 70, 90);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("tree climbing", 25, 72, 80);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("rowing", 30, 70, 90);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("hiking", 60, 80, 85);
SELECT * FROM exercise_logs;
SELECT type, SUM(calories) AS total_calories FROM exercise_logs GROUP BY type;
SELECT type, SUM(calories) AS total_calories FROM exercise_logs
GROUP BY type
HAVING total_calories > 150
;
SELECT type, AVG(calories) AS avg_calories FROM exercise_logs
GROUP BY type
HAVING avg_calories > 70
;
SELECT type FROM exercise_logs GROUP BY type HAVING COUNT(*) >= 2;
This was tried.
I was expecting the last query to look like this:
SELECT type
FROM exercise_logs
WHERE COUNT(type) > 2
GROUP BY type;
2
Answers
Every SQL management system keep track of some statistics and hidden data. Included in the hidden data there is a unique identifier other than the primary key you just defined. This UID is used by MySQL to store and manage your row.
When you run
COUNT(Att1)
, it will result as "How many occurence of my attribute Att1 there are, except NULL values". When you runCOUNT(*)
you can translate that as "how many rows are in my group". When you don’t have aGROUP BY
you consider having only one giant group that contains all your values.WHERE
is ony for row type of filtering andHAVING
for group filtering. So you can’t doWHERE COUNT(*) > 2
and you have to work over a group, soHAVING COUNT(*) > 2
.You don’t have an explicit attribute in your database that will store the
COUNT(*)
of your groups but some datastuctures like "Trees" (you’ll probably see that in "Index" chapter) can compute this value in a fraction of a second.Here you can have more exemples for
WHERE
vsHAVING
:difference-between-where-and-having-clause-in-sql/
COUNT
is a group data.The
GROUP BY
operation creates groups of rows based on the values inside the specified columns — rows with same values in those columns are grouped together, NULL equals NULL for this operation.Values inside a column that is not present in the
GROUP BY
clause can only be accessed in aggregate form e.g.SUM
,AVG
,COUNT
andGROUP_CONCAT
. The aggregate functions can be used inside subsequent clauses including:HAVING
SELECT
ORDER BY
Suppose you have this data:
This (ANSI-compliant) query:
will be evaluated as follows:
GROUP BY
gives you two groups: cycling and running, each group encapsulates two rowsSUM(calories)
is calculated for each group: 300 and 130HAVING
filters the grouped results: the cycling group is keptSELECT
gives you access to the columns used inGROUP BY
and any aggregate functionORDER BY
has access to everything theSELECT
clause can access, and the aliases defined in theSELECT
clause itselfLikewise for your second and third query. Note that you are not required to match the aggregate functions in
SELECT
andHAVING
clause. You can use any or you can use none.Regrading the
WHERE
clause… it is evaluated beforeGROUP BY
… the rows filtered by theWHERE
clause are fed to theGROUP BY
clause.