Apparently, sorting is somehow applied incorrectly in mysql 8 (checked on 8.0.33 to 8.0.35) when query has ORDER BY + GROUP BY + GROUP_CONCAT() + COUNT(*) OVER() window function. See test case below (NOTE that it’s synthetic and oversimplified for clarity – obviously in a real case grouping wouldn’t be run on a single table).
Question 1: why in the first query ordering is not applied as I expect it (i.e. ascending by sort
field)?
Question 2: what would be a recommended fix (or work around)? Duplicating the ordering specification into COUNT(*) OVER() doesn’t seem a very elegant or robust solution.
Schema and test data
CREATE TABLE IF NOT EXISTS users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(255),
email VARCHAR(255),
sort INT
);
INSERT INTO users (username, email, sort) VALUES ('user1', '[email protected]', 50);
INSERT INTO users (username, email, sort) VALUES ('user2', '[email protected]', 30);
INSERT INTO users (username, email, sort) VALUES ('user3', '[email protected]', 20);
INSERT INTO users (username, email, sort) VALUES ('user4', '[email protected]', 90);
INSERT INTO users (username, email, sort) VALUES ('user5', '[email protected]', 40);
INSERT INTO users (username, email, sort) VALUES ('user6', '[email protected]', 70);
Unexpected result
Query:
SELECT
sort,
username,
GROUP_CONCAT(email) AS email_concat,
COUNT(*) OVER () AS total_count
FROM users
GROUP BY id
ORDER BY sort;
Result (unexpected, not ordered by sort
):
| sort | username | email_concat | total_count |
| ---- | -------- | ----------------- | ----------- |
| 50 | user1 | [email protected] | 6 |
| 30 | user2 | [email protected] | 6 |
| 20 | user3 | [email protected] | 6 |
| 90 | user4 | [email protected] | 6 |
| 40 | user5 | [email protected] | 6 |
| 70 | user6 | [email protected] | 6 |
Execution plan:
-> Window aggregate with buffering: count(0) OVER ()
-> Table scan on <temporary> (cost=2.5..2.5 rows=0)
-> Temporary table (cost=0..0 rows=0)
-> Group aggregate: group_concat(users.email separator ',')
-> Sort: users.id
-> Stream results (cost=0.85 rows=6)
-> Sort: users.sort (cost=0.85 rows=6)
-> Table scan on users (cost=0.85 rows=6)
Expected result
However, if we add ORDER BY into window function (which, in my understanding, is a no-op/redundant in this case), sorting is applied as expected:
SELECT
sort,
username,
GROUP_CONCAT(email) AS email_concat,
COUNT(*) OVER (ORDER BY sort) AS total_count
FROM users
GROUP BY id
ORDER BY sort;
Result (as wanted, ordered by sort
):
| sort | username | email_concat | total_count |
| ---- | -------- | ----------------- | ----------- |
| 20 | user3 | [email protected] | 1 |
| 30 | user2 | [email protected] | 2 |
| 40 | user5 | [email protected] | 3 |
| 50 | user1 | [email protected] | 4 |
| 70 | user6 | [email protected] | 5 |
| 90 | user4 | [email protected] | 6 |
Execution plan:
-> Sort: users.sort
-> Table scan on <temporary> (cost=2.5..2.5 rows=0)
-> Temporary table (cost=0..0 rows=0)
-> Window aggregate with buffering: count(0) OVER (ORDER BY users.sort )
-> Sort: users.sort
-> Stream results
-> Group aggregate: group_concat(users.email separator ',')
-> Sort: users.id
-> Stream results (cost=0.85 rows=6)
-> Table scan on users (cost=0.85 rows=6)
2
Answers
It appears to be a bug triggered by the use of
GROUP_CONCAT()
, as removing that produces the correct ordering.A workaround is to put the grouped query in a subquery, and use
ORDER BY sort
in the main query.And if you want to get a running count instead of a total count, you need to use
OVER (ORDER BY id)
. This also works around the bug somehow.Question 1: I think, you have a combined issue which leeds to an unexpected behavior. Some facts:
Question 2: A good a approach for problem like this is, that you first select your data, then use it as a subquery and then order the result as a second step: