MySql group by is slow

midget
October 19, 2023
148 views
0 votes
2 Answers

I have a scenario whereby I am trying to fetch a user’s most recent messages from over 2 million rows and group them by their parent (or thread) id. However, this grouping is causing the query time to be around 1s, about 1000 times slower than without the group by.

Here is the table

CREATE TABLE `msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) unsigned NOT NULL,
`msg_from` int(10) unsigned NOT NULL,
`msg` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT current_timestamp(),
`parent` int(10) unsigned NOT NULL,
 PRIMARY KEY (`msg_id`),
 KEY `msg_toIX` (`msg_to`) USING BTREE,
 KEY `msg_fromIX` (`msg_from`) USING BTREE,
 KEY `parentIX` (`parent`) USING BTREE ) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Here is my query

SELECT a.msg_id, a.msg_from, a.msg FROM msg a 
JOIN(SELECT MAX(msg_id) maxid FROM msg WHERE msg_to = 23 GROUP BY parent ORDER BY msg_id DESC LIMIT 10) b
ON a.msg_id IN (b.maxid)
ORDER BY a.msg_id DESC LIMIT 10

Explained

Is this as good as it gets or should I be getting better performance seeing as I can extract 10 thousand rows in 0.001 with a condition and without the group by clause? Am I going about this the wrong way?

Thanks for your continued guidance and support

Tags: group-by mysql

Answers

Create a composite index for both parent and msg_id so that getting the maximum ID for each parent is optimized.

CREATE TABLE `msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) unsigned NOT NULL,
`msg_from` int(10) unsigned NOT NULL,
`msg` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT current_timestamp(),
`parent` int(10) unsigned NOT NULL,
 PRIMARY KEY (`msg_id`),
 KEY `msg_toIX` (`msg_to`) USING BTREE,
 KEY `msg_fromIX` (`msg_from`) USING BTREE,
 KEY `parentIX` (`parent`, msg_id) USING BTREE ) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

- OJones
- October 19, 2023 at 1:01 am
- 0 votes
0
Try this compound index. It covers, I believe, your inner query.
```
CREATE INDEX to_parent_id ON msg (msg_to, parent, msg_id);
```
The inner query should be fulfillable by a range scan on this index.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.