I have a scenario whereby I am trying to fetch a user’s most recent messages from over 2 million rows and group them by their parent (or thread) id. However, this grouping is causing the query time to be around 1s, about 1000 times slower than without the group by.
Here is the table
CREATE TABLE `msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) unsigned NOT NULL,
`msg_from` int(10) unsigned NOT NULL,
`msg` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT current_timestamp(),
`parent` int(10) unsigned NOT NULL,
PRIMARY KEY (`msg_id`),
KEY `msg_toIX` (`msg_to`) USING BTREE,
KEY `msg_fromIX` (`msg_from`) USING BTREE,
KEY `parentIX` (`parent`) USING BTREE )
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Here is my query
SELECT a.msg_id, a.msg_from, a.msg FROM msg a
JOIN(SELECT MAX(msg_id) maxid FROM msg WHERE msg_to = 23 GROUP BY parent ORDER BY msg_id DESC LIMIT 10) b
ON a.msg_id IN (b.maxid)
ORDER BY a.msg_id DESC LIMIT 10
Is this as good as it gets or should I be getting better performance seeing as I can extract 10 thousand rows in 0.001 with a condition and without the group by clause? Am I going about this the wrong way?
Thanks for your continued guidance and support
2
Answers
Create a composite index for both
parent
andmsg_id
so that getting the maximum ID for each parent is optimized.Try this compound index. It covers, I believe, your inner query.
The inner query should be fulfillable by a range scan on this index.