I am using using MySQL (InnoDB engine). I am running SQL query with WHERE IN
statement which is also ordered.
I decided to create the perfect index, so my query will not run slow. Surprisingly, after creating the perfect index MySQL is still using filesort when running my query.
My Query:
SELECT *
FROM `events`
WHERE
`events`.`user_id` IN (118, 105, 126, 196, 338)
ORDER BY
`date` ASC,
`planing_hour_id` ASC;
My perfect Index:
ALTER TABLE `events`
ADD INDEX `my_perfect_index` (
`user_id`,
`date` ASC,
`planing_hour_id` ASC
);
Problem:
MySQL still uses filesort even when there is perfect index available. I would expect only Using Index
condition to be present in output of EXPLAIN
query. Why is this not the case?
Disclaimer:
I checked similar topics and none of them has been useful:
2
Answers
You can eliminate "Using filesort" if MySQL’s natural order of reading the rows matches the order you request in your ORDER BY. Thus there is no sorting needed, because they are already read in the right order.
MySQL always reads in index order. Whether it uses a secondary index like yours, or the primary key (also called the clustered index), rows will be read in the order they are stored in that index. So if that order does not match your ORDER BY, then a filesort is required.
(Note "filesort" doesn’t necessarily mean it will be on disk. If there are few rows in the result, the sorting can be done in RAM relatively quickly.)
So why doesn’t your index count as a "perfect" index?
Think of this analogy:
The result:
These are read in index order by
last_name
. But they are not implicitly in order byfirst_name
, the order requested in the query. The query needs a phase to filesort the set of matching rows.If you were to make the "perfect" index this way:
The query might use this index, and eliminate the filesort, because it’s reading the rows in order by
first_name
. However, this can’t also do the search to find rows matchinglast_name
most efficiently.The best it can do is index condition pushdown, which is to tell the storage engine not to return rows that don’t match the index — but the storage engine does have to examine all the indexes to evaluate them. This is not as good for optimization as reading from an index leading with
last_name
.Ultimately, there is no perfect index in cases where the query must search with a range condition (
IN()
qualifies as a range condition), and also do a sort that doesn’t match the column of the range condition.If your concern is high performance you can try rewriting your query as:
This form effectivelly seeks the index five times (albeit hitting the secondary and primary index each time), then unions the result sets, and finally sort them.
In this case a simple index is useful, since the sorting won’t use the index and will require materialization. Nothing bad with it unless the query ends up selecting thousands of rows.