I’m working on an exercise from "Murach’s MySQL 3rd Edition" which gives the following prompt:
"11. Write a SELECT statement that uses an aggregate window function to calculate a moving average of the sum of invoice totals. Return these columns:
- The month of the invoice date from the Invoices table
- The sum of the invoice totals from the Invoices table
- The moving average of the invoice totals sorted by invoice month
The result set should be grouped by invoice month and the frame for the
moving average should include the current row plus three rows before the
current row."
The invoices table being referenced looks like this (114 total rows):
invoice_id | vendor_id | invoice_number | invoice_date | invoice_total | payment_total | credit_total | terms_id | invoice_due_date | payment_date |
---|---|---|---|---|---|---|---|---|---|
1 | 122 | 989319-457 | 2018-04-08 | 3813.33 | 3813.33 | 0 | 3 | 2018-05-08 | 2018-05-07 |
The solution I came up with initially was this:
SELECT
EXTRACT(MONTH FROM invoice_date) AS months,
SUM(invoice_total) AS invoice_total_sum,
AVG(invoice_total) OVER(
ORDER BY EXTRACT(MONTH FROM invoice_date)
ROWS 3 PRECEDING
) AS rolling_avg
FROM
invoices
GROUP BY
months;
When I run this, I get the following error:
Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'ap.invoices.invoice_total' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
After looking at the Author’s provided solution, and re-reading the question, I realized I was supposed to take the average of the monthly sums, not just the average of the invoices themselves. So the real solution is this:
SELECT
EXTRACT(MONTH FROM invoice_date) AS months,
SUM(invoice_total) AS invoice_total_sum,
AVG(SUM(invoice_total)) OVER( # why does this need the SUM() function?
ORDER BY EXTRACT(MONTH FROM invoice_date)
ROWS 3 PRECEDING
) AS rolling_avg
FROM
invoices
GROUP BY
months;
This produces an output table like so (with 5 rows):
months | invoice_total_sum | rolling_avg |
---|---|---|
4 | 5828.18 | 5828.180000 |
Where I’m not understanding, however, is why the first solution doesn’t run. I know why it isn’t what the exercise is looking for, but I don’t see what’s causing it to error out.
My understanding is that the reason aggregate functions like sum()
and avg()
give errors about ‘functional dependence’ is because otherwise you’d get an inconsistent number of output rows. If I was trying to use just sum(invoice_total)
and invoice_date
, it would produce one row for the sum and many for the dates, and wouldn’t be able to resolve them into an output table (without changing the only_full_group_by mode). But the average function is also an aggregate function, so why doesn’t it just average all the invoice totals for each month, as specified by the group by
clause?
2
Answers
The comment from lemon was what helped me figure it out. The
avg()
function is associated with theopen()
function, so it isn't available when thegroup by
clause takes effect.(1) The
AVG
is an aggregate function, but here it is used with anOVER
clause which makes it a window function. With window function, the result is displayed on each row. Whereas with a regular aggregate function it combines multiple rows and displays result on one row.In your query you could remove the
SUM
like below and can see that the grouping ofAVG
is there only for the current row and the preceding 3 rows; the results are displayed on each row(2) Your next question was why the
SUM(invoice_total)
is inside ofAVG
.Here the AVG function is working over the SUM of rows which are already GROUPed BY EXTRACT(MONTH FROM invoice_date). Inside the window function the grouped rows are ORDERed BY so that the current row looks back over past 3 SUM’ed rows (which is past 3 months) to get the
AVG
Hope this helps