skip to Main Content

I am doing a beginner level course on Khan Academy for SQL.

CREATE TABLE exercise_logs
    (id INTEGER PRIMARY KEY AUTOINCREMENT,
    type TEXT,
    minutes INTEGER, 
    calories INTEGER,
    heart_rate INTEGER);

.....Inserted data into the table....


SELECT type, SUM(calories) AS total_calories FROM exercise_logs
    GROUP BY type
    HAVING total_calories > 150;

SELECT type, AVG(calories) AS avg_calories FROM exercise_logs
    GROUP BY type
    HAVING avg_calories > 70; 
    
SELECT type FROM exercise_logs GROUP BY type HAVING COUNT(*) >= 2;

I understood the Having clause in the first two ‘Select’ queries. It clearly sums up or takes the average of the calories of each exercise type grouped.

In the third select clause, I did not however understand how after ‘grouping by type’ SQL also registers a count of ‘how many’ have been grouped.

Does it by default register how many rows have been grouped?

The idea that the sum of calories or its average can be queried after grouping seems intuitive, since the calories is an integer attribute.

But there is no attribute that registers the count.

Maybe we can think of the primary key as a unique identifier, and that SQL taps into this knowledge to identify how many such unique identifiers have been grouped together to understand the ‘count’.

But I am looking for a concrete answer.

Thanks for the help in advance.

CREATE TABLE exercise_logs
    (id INTEGER PRIMARY KEY AUTOINCREMENT,
    type TEXT,
    minutes INTEGER, 
    calories INTEGER,
    heart_rate INTEGER);

INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("biking", 30, 115, 110);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("biking", 10, 45, 105);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("dancing", 15, 200, 120);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("dancing", 15, 165, 120);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("tree climbing", 30, 70, 90);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("tree climbing", 25, 72, 80);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("rowing", 30, 70, 90);
INSERT INTO exercise_logs(type, minutes, calories, heart_rate) VALUES ("hiking", 60, 80, 85);

SELECT * FROM exercise_logs;

SELECT type, SUM(calories) AS total_calories FROM exercise_logs GROUP BY type;

SELECT type, SUM(calories) AS total_calories FROM exercise_logs
    GROUP BY type
    HAVING total_calories > 150
    ;

SELECT type, AVG(calories) AS avg_calories FROM exercise_logs
    GROUP BY type
    HAVING avg_calories > 70
    ; 
    
SELECT type FROM exercise_logs GROUP BY type HAVING COUNT(*) >= 2;

This was tried.

I was expecting the last query to look like this:

SELECT type 
FROM exercise_logs 
WHERE COUNT(type) > 2
GROUP BY type;

2

Answers


  1. Every SQL management system keep track of some statistics and hidden data. Included in the hidden data there is a unique identifier other than the primary key you just defined. This UID is used by MySQL to store and manage your row.

    When you run COUNT(Att1), it will result as "How many occurence of my attribute Att1 there are, except NULL values". When you run COUNT(*) you can translate that as "how many rows are in my group". When you don’t have a GROUP BY you consider having only one giant group that contains all your values.

    WHERE is ony for row type of filtering and HAVING for group filtering. So you can’t do WHERE COUNT(*) > 2 and you have to work over a group, so HAVING COUNT(*) > 2.

    You don’t have an explicit attribute in your database that will store the COUNT(*) of your groups but some datastuctures like "Trees" (you’ll probably see that in "Index" chapter) can compute this value in a fraction of a second.

    Here you can have more exemples for WHERE vs HAVING :
    difference-between-where-and-having-clause-in-sql/

    WHERE clause is used to filter the data in a table on basis of a given condition. Below is a simple TSQL statement’s syntax with a WHERE clause.

    The HAVING clause is used in SQL to filter grouped data. To understand the HAVING we need to understand the Group by Clause.

    COUNT is a group data.

    Login or Signup to reply.
  2. The GROUP BY operation creates groups of rows based on the values inside the specified columns — rows with same values in those columns are grouped together, NULL equals NULL for this operation.

    Values inside a column that is not present in the GROUP BY clause can only be accessed in aggregate form e.g. SUM, AVG, COUNT and GROUP_CONCAT. The aggregate functions can be used inside subsequent clauses including:

    • HAVING
    • SELECT
    • ORDER BY

    Suppose you have this data:

    type date calories
    cycling 2023-05-08 100
    cycling 2023-05-09 200
    running 2023-05-08 50
    running 2023-05-09 80

    This (ANSI-compliant) query:

    SELECT type, SUM(calories) AS total_calories
    FROM exercise_logs
    GROUP BY type
    HAVING SUM(calories) > 150;
    

    will be evaluated as follows:

    • GROUP BY gives you two groups: cycling and running, each group encapsulates two rows
    • SUM(calories) is calculated for each group: 300 and 130
    • HAVING filters the grouped results: the cycling group is kept
    • SELECT gives you access to the columns used in GROUP BY and any aggregate function
    • ORDER BY has access to everything the SELECT clause can access, and the aliases defined in the SELECT clause itself

    Likewise for your second and third query. Note that you are not required to match the aggregate functions in SELECT and HAVING clause. You can use any or you can use none.

    Regrading the WHERE clause… it is evaluated before GROUP BY… the rows filtered by the WHERE clause are fed to the GROUP BY clause.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search