Postgresql - order of execution confusion

Kartik
June 13, 2023
175 views
0 votes
2 Answers

SELECT First_name,last_name,email,ROUND (AVG(Amount),2) AS moneyspent
FROM Customer INNER JOIN Payment
ON Customer.customer_id= Payment.customer_id
WHERE Staff_id=2 
GROUP BY First_name,last_name,email
HAVING SUM(Amount)>100
ORDER BY Moneyspent DESC

first_name	last_name	email	moneyspent
Brittany	Riley	[email protected]	5.83
Arnold	Havens	[email protected]	5.14
Eleanor	Hunt	[email protected]	5.04
Stacey	Montgomery	[email protected]	4.73
Karl	Seal	[email protected]	4.63

I am confused about the Order of execution. I understand we cannot filter based on an aggregated result like AVG(Amount) in WHERE because that hasn’t been executed yet. Does AVG(Amount) happen at the GROUP BY or SELECT?

If it happens at GROUP BY, there should be three columns First_name, Last_name, email and AVG(Amount) left. How can HAVING SUM(AMOUNT) still be executing?
If it happens at SELECT, then how come we can still filter using AVG(Amount) at HAVING, as HAVING comes before SELECT?

Answers

- tinazmu
- June 13, 2023 at 1:53 am
- 0 votes
0
When processing your GROUP BY the query execution engine will also evaluate SUM(Amount) (as well as AVG(Amount)) for each first_name, last_name, email combination; because these are referred to by your SELECT and HAVING, respectively. Some RDBMSs don’t allow HAVING referring to an expression not already on the SELECT clause, but I guess PostgreSql is doing us a favour and allowing it. For your purposes you should assume that those aggregations are calculated at the GROUP BY time (HAVING is applied afterwards; but this is really a ‘logical ‘ order, because the engine can apply parts of it earlier if it deduces it would be cheaper and that it wouldn’t change results).

Login or Signup to reply.

- GeorgeS
- June 13, 2023 at 4:36 am
- 0 votes
0
The GROUP BY and HAVING clauses can be thought of as evaluated before the SELECT clauses for when considering what the result would be, but any aggregate functions used in SELECT column definitions still have access to the underlying disaggregated data.

The order to think about it is:
1. JOINs and WHERE clause
2. GROUP BY; note that however, you can also group by columns in the output query, e.g. GROUP BY 1, 2 but this is syntatic sugar
3. HAVING
4. SELECT
You can see that the actual calculation of the resulting columns occurs last because if you use a window function as part of the SELECT to access (for example) the previous row you can only access data that you have either included in GROUP BY or that uses an aggregate function, and data removed by the HAVING or WHERE clause is filtered out.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – order of execution confusion

Answers