SELECT First_name,last_name,email,ROUND (AVG(Amount),2) AS moneyspent
FROM Customer INNER JOIN Payment
ON Customer.customer_id= Payment.customer_id
WHERE Staff_id=2
GROUP BY First_name,last_name,email
HAVING SUM(Amount)>100
ORDER BY Moneyspent DESC
first_name | last_name | moneyspent | |
---|---|---|---|
Brittany | Riley | [email protected] | 5.83 |
Arnold | Havens | [email protected] | 5.14 |
Eleanor | Hunt | [email protected] | 5.04 |
Stacey | Montgomery | [email protected] | 4.73 |
Karl | Seal | [email protected] | 4.63 |
I am confused about the Order of execution. I understand we cannot filter based on an aggregated result like AVG(Amount)
in WHERE
because that hasn’t been executed yet. Does AVG(Amount)
happen at the GROUP BY
or SELECT
?
-
If it happens at
GROUP BY
, there should be three columnsFirst_name
,Last_name
,email
andAVG(Amount)
left. How canHAVING SUM(AMOUNT)
still be executing? -
If it happens at
SELECT
, then how come we can still filter usingAVG(Amount)
atHAVING
, asHAVING
comes beforeSELECT
?
2
Answers
When processing your
GROUP BY
the query execution engine will also evaluateSUM(Amount)
(as well asAVG(Amount)
) for each first_name, last_name, email combination; because these are referred to by your SELECT and HAVING, respectively. Some RDBMSs don’t allowHAVING
referring to an expression not already on the SELECT clause, but I guess PostgreSql is doing us a favour and allowing it. For your purposes you should assume that those aggregations are calculated at the GROUP BY time (HAVING is applied afterwards; but this is really a ‘logical ‘ order, because the engine can apply parts of it earlier if it deduces it would be cheaper and that it wouldn’t change results).The
GROUP BY
andHAVING
clauses can be thought of as evaluated before theSELECT
clauses for when considering what the result would be, but any aggregate functions used inSELECT
column definitions still have access to the underlying disaggregated data.The order to think about it is:
JOIN
s andWHERE
clauseGROUP BY
; note that however, you can also group by columns in the output query, e.g.GROUP BY 1, 2
but this is syntatic sugarHAVING
SELECT
You can see that the actual calculation of the resulting columns occurs last because if you use a window function as part of the
SELECT
to access (for example) the previous row you can only access data that you have either included inGROUP BY
or that uses an aggregate function, and data removed by theHAVING
orWHERE
clause is filtered out.