skip to Main Content
SELECT First_name,last_name,email,ROUND (AVG(Amount),2) AS moneyspent
FROM Customer INNER JOIN Payment
ON Customer.customer_id= Payment.customer_id
WHERE Staff_id=2 
GROUP BY First_name,last_name,email
HAVING SUM(Amount)>100
ORDER BY Moneyspent DESC
first_name last_name email moneyspent
Brittany Riley [email protected] 5.83
Arnold Havens [email protected] 5.14
Eleanor Hunt [email protected] 5.04
Stacey Montgomery [email protected] 4.73
Karl Seal [email protected] 4.63

I am confused about the Order of execution. I understand we cannot filter based on an aggregated result like AVG(Amount) in WHERE because that hasn’t been executed yet. Does AVG(Amount) happen at the GROUP BY or SELECT?

  1. If it happens at GROUP BY, there should be three columns First_name, Last_name, email and AVG(Amount) left. How can HAVING SUM(AMOUNT) still be executing?

  2. If it happens at SELECT, then how come we can still filter using AVG(Amount) at HAVING, as HAVING comes before SELECT?

2

Answers


  1. When processing your GROUP BY the query execution engine will also evaluate SUM(Amount) (as well as AVG(Amount)) for each first_name, last_name, email combination; because these are referred to by your SELECT and HAVING, respectively. Some RDBMSs don’t allow HAVING referring to an expression not already on the SELECT clause, but I guess PostgreSql is doing us a favour and allowing it. For your purposes you should assume that those aggregations are calculated at the GROUP BY time (HAVING is applied afterwards; but this is really a ‘logical ‘ order, because the engine can apply parts of it earlier if it deduces it would be cheaper and that it wouldn’t change results).

    Login or Signup to reply.
  2. The GROUP BY and HAVING clauses can be thought of as evaluated before the SELECT clauses for when considering what the result would be, but any aggregate functions used in SELECT column definitions still have access to the underlying disaggregated data.

    The order to think about it is:

    1. JOINs and WHERE clause
    2. GROUP BY; note that however, you can also group by columns in the output query, e.g. GROUP BY 1, 2 but this is syntatic sugar
    3. HAVING
    4. SELECT

    You can see that the actual calculation of the resulting columns occurs last because if you use a window function as part of the SELECT to access (for example) the previous row you can only access data that you have either included in GROUP BY or that uses an aggregate function, and data removed by the HAVING or WHERE clause is filtered out.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search