skip to Main Content

I understand that this query will produce numbers and cumulative tallies:

SELECT number
     , SUM(number) OVER (ORDER BY number) cumulative_number
  FROM (SELECT 1 number UNION SELECT 2 UNION SELECT 3) inside
 GROUP BY number

Result:

number cumulative_number
1 1
2 3
3 6

And I sort of get how this function works. There are partitions constructed by my GROUP BY clause "GROUP BY number". My partitions are number = 1, number = 2 and number = 3. And for example in the number = 2 case, somehow the SUM() function is magically accessing rows outside its partition which is somehow designated by my window_spec order_clause "ORBER BY number".

👀 But what I don’t understand is how the MySQL documentation of ORDER BY in this context has anything to do with anything.

order_clause: An ORDER BY clause indicates how to sort rows in each partition. Partition rows that are equal according to the ORDER BY clause are considered peers. If ORDER BY is omitted, partition rows are unordered, with no processing order implied, and all partition rows are peers.

A partition is a set. Sets are unsorted according to first-grade math. So sorting something inside a partition is already tripping my BS detector. Even if you tell me that sets are ordered, the sum function I learned in first grade certainly does not care what order its operands are sorted.


Extending this misunderstanding to another concrete example:

SELECT ROUND(number, -1) tens
     , SUM(COUNT(*)) OVER (ORDER BY number) cumulative_number
  FROM (SELECT 1 number UNION SELECT 2 UNION SELECT 11 UNION SELECT 12) inside
 GROUP BY tens

Assuming that MySQL documentation means something useful and is not just wrong words. Let’s consider how ORDER BY applies here by numbers. The 11 does not equal the 12, they are different and so an ORDER BY differentiates them. They are both in the tens = 10 partition. So magically somehow the SUM() function needs to pick the sum of 3 or the sum of 4 expression result from the 11 or 12 row from inside the tens = 10 partition.

👀 How does it pick the result it does (spoiler: it picks the sum of 4)?

And also…

👀 What fundamental wisdom in the MySQL documentation am I failing to appreciate, or how is that documentation deficient?

2

Answers


  1. In the query:

    SELECT number,
      SUM(number) OVER (ORDER BY number) cumulative_number
    FROM (SELECT 1 number UNION SELECT 2 UNION SELECT 3) inside
    GROUP BY number
    

    Since you didn’t use a PARTITION BY clause in the window function, the window function has a single partition. That is, it considers all the resulting rows of the aggregation as a single partition.

    The aggregation returns three rows:

    • 1
    • 2
    • 3

    Then, the window function SUM() computes the running sum of it (1, 3, 6). There’s no magic in it.

    Login or Signup to reply.
  2. Window functions allow to show individual values and aggregate values at same time. Maybe it is the GROUP BY number that is confusing you.

    The query:

    SELECT number
         , SUM(number) OVER (ORDER BY number) cumulative_number
      FROM (SELECT 1 number UNION SELECT 2 UNION SELECT 3) inside
     GROUP BY number
    

    Produces the same result as the query:

    SELECT number
         , SUM(number) OVER (ORDER BY number) cumulative_number
      FROM (SELECT 1 number UNION SELECT 2 UNION SELECT 3) inside
    

    The wndow function treats all rows as one partition (unless you use PARTITION BY). It does aggregation for each row. The SUM(number) OVER (ORDER BY number) applies to all rows. As the query has only one normal column and a window function, the GROUP BY will do nothing.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search