skip to Main Content

I’m working on an exercise from "Murach’s MySQL 3rd Edition" which gives the following prompt:

"11. Write a SELECT statement that uses an aggregate window function to calculate a moving average of the sum of invoice totals. Return these columns:

  • The month of the invoice date from the Invoices table
  • The sum of the invoice totals from the Invoices table
  • The moving average of the invoice totals sorted by invoice month

The result set should be grouped by invoice month and the frame for the
moving average should include the current row plus three rows before the
current row."

The invoices table being referenced looks like this (114 total rows):

invoice_id vendor_id invoice_number invoice_date invoice_total payment_total credit_total terms_id invoice_due_date payment_date
1 122 989319-457 2018-04-08 3813.33 3813.33 0 3 2018-05-08 2018-05-07

The solution I came up with initially was this:

SELECT
    EXTRACT(MONTH FROM invoice_date) AS months, 
    SUM(invoice_total) AS invoice_total_sum,
    AVG(invoice_total) OVER(
        ORDER BY EXTRACT(MONTH FROM invoice_date)
        ROWS 3 PRECEDING
    ) AS rolling_avg
FROM
    invoices
GROUP BY
    months;

When I run this, I get the following error:

Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'ap.invoices.invoice_total' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

After looking at the Author’s provided solution, and re-reading the question, I realized I was supposed to take the average of the monthly sums, not just the average of the invoices themselves. So the real solution is this:

SELECT
    EXTRACT(MONTH FROM invoice_date) AS months, 
    SUM(invoice_total) AS invoice_total_sum,
    AVG(SUM(invoice_total)) OVER( # why does this need the SUM() function?
        ORDER BY EXTRACT(MONTH FROM invoice_date)
        ROWS 3 PRECEDING
    ) AS rolling_avg
FROM
    invoices
GROUP BY
    months;

This produces an output table like so (with 5 rows):

months invoice_total_sum rolling_avg
4 5828.18 5828.180000

Where I’m not understanding, however, is why the first solution doesn’t run. I know why it isn’t what the exercise is looking for, but I don’t see what’s causing it to error out.

My understanding is that the reason aggregate functions like sum() and avg() give errors about ‘functional dependence’ is because otherwise you’d get an inconsistent number of output rows. If I was trying to use just sum(invoice_total) and invoice_date , it would produce one row for the sum and many for the dates, and wouldn’t be able to resolve them into an output table (without changing the only_full_group_by mode). But the average function is also an aggregate function, so why doesn’t it just average all the invoice totals for each month, as specified by the group by clause?

2

Answers


  1. Chosen as BEST ANSWER

    The comment from lemon was what helped me figure it out. The avg() function is associated with the open() function, so it isn't available when the group by clause takes effect.


  2. (1) The AVG is an aggregate function, but here it is used with an OVER clause which makes it a window function. With window function, the result is displayed on each row. Whereas with a regular aggregate function it combines multiple rows and displays result on one row.

    In your query you could remove the SUM like below and can see that the grouping of AVG is there only for the current row and the preceding 3 rows; the results are displayed on each row

    SELECT
        EXTRACT(MONTH FROM invoice_date) AS months,
        invoice_date, 
        AVG(invoice_total) OVER(
            ORDER BY EXTRACT(MONTH FROM invoice_date)
            ROWS 3 PRECEDING
        ) AS rolling_avg
    FROM
        invoices
    ORDER BY EXTRACT(MONTH FROM invoice_date)
    

    (2) Your next question was why the SUM(invoice_total) is inside of AVG.
    Here the AVG function is working over the SUM of rows which are already GROUPed BY EXTRACT(MONTH FROM invoice_date). Inside the window function the grouped rows are ORDERed BY so that the current row looks back over past 3 SUM’ed rows (which is past 3 months) to get the AVG

    Hope this helps

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search