skip to Main Content

Apparently, sorting is somehow applied incorrectly in mysql 8 (checked on 8.0.33 to 8.0.35) when query has ORDER BY + GROUP BY + GROUP_CONCAT() + COUNT(*) OVER() window function. See test case below (NOTE that it’s synthetic and oversimplified for clarity – obviously in a real case grouping wouldn’t be run on a single table).

Question 1: why in the first query ordering is not applied as I expect it (i.e. ascending by sort field)?

Question 2: what would be a recommended fix (or work around)? Duplicating the ordering specification into COUNT(*) OVER() doesn’t seem a very elegant or robust solution.

Schema and test data

CREATE TABLE IF NOT EXISTS users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    username VARCHAR(255),
    email VARCHAR(255),
    sort INT
);

INSERT INTO users (username, email, sort) VALUES ('user1', '[email protected]', 50);
INSERT INTO users (username, email, sort) VALUES ('user2', '[email protected]', 30);
INSERT INTO users (username, email, sort) VALUES ('user3', '[email protected]', 20);
INSERT INTO users (username, email, sort) VALUES ('user4', '[email protected]', 90);
INSERT INTO users (username, email, sort) VALUES ('user5', '[email protected]', 40);
INSERT INTO users (username, email, sort) VALUES ('user6', '[email protected]', 70);

Unexpected result

Query:

SELECT
    sort,
    username,
    GROUP_CONCAT(email) AS email_concat,
    COUNT(*) OVER () AS total_count
FROM users
GROUP BY id
ORDER BY sort;

Result (unexpected, not ordered by sort):

| sort | username | email_concat      | total_count |
| ---- | -------- | ----------------- | ----------- |
| 50   | user1    | [email protected] | 6           |
| 30   | user2    | [email protected] | 6           |
| 20   | user3    | [email protected] | 6           |
| 90   | user4    | [email protected] | 6           |
| 40   | user5    | [email protected] | 6           |
| 70   | user6    | [email protected] | 6           |

Execution plan:

 -> Window aggregate with buffering: count(0) OVER () 
    -> Table scan on <temporary>  (cost=2.5..2.5 rows=0)
        -> Temporary table  (cost=0..0 rows=0)
            -> Group aggregate: group_concat(users.email separator ',')
                -> Sort: users.id
                    -> Stream results  (cost=0.85 rows=6)
                        -> Sort: users.sort  (cost=0.85 rows=6)
                            -> Table scan on users  (cost=0.85 rows=6)

DB Fiddle 1 (wrong result)

Expected result

However, if we add ORDER BY into window function (which, in my understanding, is a no-op/redundant in this case), sorting is applied as expected:

SELECT
    sort,
    username,
    GROUP_CONCAT(email) AS email_concat,
    COUNT(*) OVER (ORDER BY sort) AS total_count
FROM users
GROUP BY id
ORDER BY sort;

Result (as wanted, ordered by sort):

| sort | username | email_concat      | total_count |
| ---- | -------- | ----------------- | ----------- |
| 20   | user3    | [email protected] | 1           |
| 30   | user2    | [email protected] | 2           |
| 40   | user5    | [email protected] | 3           |
| 50   | user1    | [email protected] | 4           |
| 70   | user6    | [email protected] | 5           |
| 90   | user4    | [email protected] | 6           |

Execution plan:

 -> Sort: users.sort
    -> Table scan on <temporary>  (cost=2.5..2.5 rows=0)
        -> Temporary table  (cost=0..0 rows=0)
            -> Window aggregate with buffering: count(0) OVER (ORDER BY users.sort ) 
                -> Sort: users.sort
                    -> Stream results
                        -> Group aggregate: group_concat(users.email separator ',')
                            -> Sort: users.id
                                -> Stream results  (cost=0.85 rows=6)
                                    -> Table scan on users  (cost=0.85 rows=6)

DB Fiddle 2 (correct result)

2

Answers


  1. It appears to be a bug triggered by the use of GROUP_CONCAT(), as removing that produces the correct ordering.

    A workaround is to put the grouped query in a subquery, and use ORDER BY sort in the main query.

    SELECT *
    FROM (
      SELECT
          sort,
          username,
          GROUP_CONCAT(email) AS email_concat,
          COUNT(*) OVER () AS total_count
      FROM users
      GROUP BY id) AS x
    ORDER BY sort;
    

    And if you want to get a running count instead of a total count, you need to use OVER (ORDER BY id). This also works around the bug somehow.

    Login or Signup to reply.
  2. Question 1: I think, you have a combined issue which leeds to an unexpected behavior. Some facts:

    1. First of all, when using GROUP BY, the columns in the SELECT clause must either be part of the grouping or be used with an aggregate function. Grouping by id makes no sense, as all id’s are different
    2. COUNT(*) OVER is a window function which will applied after the GROUP BY clause
    3. COUNT(*) OVER (), without ORDER BY, calculates a count of all rows in the result set without regard to any specific order

    Question 2: A good a approach for problem like this is, that you first select your data, then use it as a subquery and then order the result as a second step:

    SELECT sort, username, email_concat, total_count
    FROM (
      ... your query ...
    ) AS q1
    ORDER BY sort;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search