There is a table with these columns:
Name | Type |
---|---|
Id | int |
DateTime | datetime(6) |
CompanyId | int (FK) |
IsExcluded | tinyint(1) |
There are 2 BTREE indexes:
- CompanyId
- ComapnyId,DateTime,IsExcluded
Following select is using the first index with only one column and this select takes 2.3sec. When I force the second index select takes 0.015sec. Also when I decrease DateTime range by one day MySQL is using the second index without forcing.
select IsExcluded,DateTime,CompanyId FROM table where
IsExcluded = 0 and
DateTime >= '2022-06-02' and
DateTime < '2022-09-22' and
CompanyId = 1;
I understand that if more than ~20%-30% of rows are selected, MySQL could decide to ignore index, but I don’t understand why MySQL is selecting different obviously not the most suitable index.
Is there any way how to setup or "learn" MySql that second index for this query is the most suitable (without inserting anything else to query) ?
3
Answers
Updated
As per my understanding the order of keys does matter.
Check also related article
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
More info at MySQL doc
You can in theory tweak the index statistics manually to influence the optimizer’s choice. See https://dev.mysql.com/blog-archive/histogram-statistics-in-mysql/
But frankly, I know of no developers who use this feature. It’s just too difficult to figure out how to use it, and any custom index statistics you choose would be likely to get out of date very quickly.
What is more common is to use index hints to tell the optimizer to consider only a specific index, or to ignore other indexes. See https://dev.mysql.com/doc/refman/8.0/en/index-hints.html
Example:
(I’m guessing at the name of your 3-column index.)
I would guess the reason your index is not used is that the
DateTime
portion is not helping enough to justify the wider index (i.e. your date range covers too many rows), and theIsExcluded
portion is not used at all, because it follows a column that is used in a range condition. So the optimizer chooses an index that’s more compact, so it can load the index into RAM in fewer I/O reads.What’s important to understand about column order in the index is that the columns involved in equality conditions should be to the left. Then you can have one column that is used in an inequality or range condition, and any subsequent columns in the index are not used for searching or sorting.
In your case you have an index on
(CompanyId,DateTime,IsExcluded)
, but the second column is used in a range condition, so the third column isn’t used for the search. At best it could try to use index condition pushdown, but that’s not as good as really narrowing down the search.The better index would be to order the columns so the
DateTime
column is last. Either of the columns used for equality may be first, but they both have to be before theDateTime
column. Either of these orders would be better in this case:Given a ‘smaller’
INDEX(a)
and a ‘larger’INDEX(a,b)
, the Optimizer will often use the smaller index even though the larger one would do more filtering.This is a recurring problem, I have seen it in many situations; there needs to be a bug report encouraging them to fix the Optimizer.
The workaround is to DROP the smaller index.
Furthermore, as Bill points out, your larger index is not optimal. In general, move any range column to the end of the index.
There’s another point here: For that
SELECT
your 3-column index should have been picked because it is "covering". That means that all of the columns anywhere in theSELECT
are in the singleINDEX
(in any order). (Bill’s recommendations are both covering and faster.)