I’m fairly new to window functions and have been learning about them recently. There is this query which returns the total sales for each month and quarter using a group by and an aggregate function. Additionally, it returns the total sales for the whole year (using a window function) and the max total sales for each quarter (using a window function). This all makes sense to me.
Query:
SELECT
quarter(payment_date) quarter,
monthname(payment_date) month_name,
SUM(amount) monthly_sales,
MAX(SUM(amount)) OVER() max_overall_values,
MAX(SUM(amount)) OVER(PARTITION BY quarter(payment_date)) max_quarter_sales
FROM
payment
WHERE
year(payment_date) = 2005
GROUP BY
quarter(payment_date),
monthname(payment_date)
ORDER BY
monthname(payment_date) DESC;
Results:
+---------+------------+---------------+--------------------+-------------------+
| quarter | month_name | monthly_sales | max_overall_values | max_quarter_sales |
+---------+------------+---------------+--------------------+-------------------+
| 2 | May | 4824.43 | 28373.89 | 9631.88 |
| 2 | June | 9631.88 | 28373.89 | 9631.88 |
| 3 | July | 28373.89 | 28373.89 | 28373.89 |
| 3 | August | 24072.13 | 28373.89 | 28373.89 |
+---------+------------+---------------+--------------------+-------------------+
I start to loose track of what’s going on if I remove "max":
SELECT
quarter(payment_date) quarter,
monthname(payment_date) month_name,
SUM(amount) monthly_sales,
SUM(amount) OVER() max_overall_values,
SUM(amount) OVER(PARTITION BY quarter(payment_date)) max_quarter_sales
FROM
payment
WHERE
year(payment_date) = 2005
GROUP BY
quarter(payment_date),
monthname(payment_date)
ORDER BY
monthname(payment_date) DESC;
I get the following results:
+---------+------------+---------------+--------------------+-------------------+
| quarter | month_name | monthly_sales | max_overall_values | max_quarter_sales |
+---------+------------+---------------+--------------------+-------------------+
| 2 | May | 4824.43 | 19.96 | 8.98 |
| 2 | June | 9631.88 | 19.96 | 8.98 |
| 3 | July | 28373.89 | 19.96 | 10.98 |
| 3 | August | 24072.13 | 19.96 | 10.98 |
+---------+------------+---------------+--------------------+-------------------+
My question is what data is the "max" window function actually processing when it’s used in this context, which involves a group by clause, and
how is it arriving at the calculation of 19.96 for max_overall_values, 8.98 for a quarter and 10.98 for the other when "max" is removed?
2
Answers
If you leave off the sum, the amount summed by the window function will be the amount that would have been returned if it were included in the select columns (e.g. SELECT amount, quarter(payment_date) quarter…).
Because different rows being grouped will have different amounts, this amount will be the amount column from an arbitrary row of those being grouped. This is almost never a useful thing, and indeed modern versions of mysql have a default ONLY_FULL_GROUP_BY sql_mode that returns an error if you attempt this in your query:
fiddle
Just sharing some findings.
Experiment
Prepare data as below. Two identical tables:
payment1
andpayment2
but rows were inserted in opposite order. (create table payment2 as select * from payment1 order by payment_date desc
)Calculating monthly_sales for reference:
max(sum())
and everything looks good.sum()
only. It seems that only first row of each month is picked (I tried use CTE to order bypayment_date
ASC
orDESC
but no difference)query against
payment2
:query against
payment2
:query against
payment2
:Environment:
ONLY_FULL_GROUP_BY
in session.Observations:
GROUP BY
quarter & month seems to make MySQL to pick the first row in physical location of each group (May, June, July, August) and then apply aggregate functions (sum(), max(), min())payment1
andpayment1
are identical but stored in opposite order and the same queries ran against the two tables returned different results.ONLY_FULL_GROUP_BY
sql_mode
may be harmful due to inconsistent results.A Question for @Robert:
Could you please run similar experiments on your end to see what the outcome looks like? My environment is very limited. Thanks