I am wondering if the following MySql SELECT query takes O(N) or O(logN).
Let us we have a table represents 4 integer intervals [startNum, endNum].
And the table is indexed by the startNum and endNum columns.
startNum, endNum
3, 8
10, 15
16, 21
28, 42
Query:
SELECT * from table
where startNum <= 19 AND endNum >= 19
I think MySql will take O(N), because it will
1. find the first 3 rows using the "startNum"; then
2. go through each of them and use the "endNum" to identify the 3rd row; then
3. return the 3rd row [16, 21] as the result.
Is MySql "smart" enough to do the following?
1. binary search on the startNum to find the position of the 3rd row, since "startNum" is sorted; then
2. binary search on the endNum to find the 3rd row again, since "endNum" is also sorted; then
3. return the 3rd row [16, 21] as the result.
From this documentation: https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html
If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the
optimizer uses it but considers no more key parts.
I don’t think MySql is doing the "smart" binary searches.
Am I right?
Is there any configuration that can enable MySql do the binary searches?
2
Answers
You are correct. It’s O(n). If you have indexes on both startNum and endNum, the query planner will choose one index. Based on table statistics it will try to choose the index that is more selective.
Then it will random-access that index to the first eligible row and proceed to scan the rest of the table to satisfy the other inequality predicate. This is in the nature of BTREE indexing. The situation is the same in every make of table server that uses BTREE indexing, not just MySql / Mariadb.
If your indexes are compound indexes, like this
the query planner will probably choose to scan the index instead of the whole table. That’s usually faster, but still O(n).
Keep in mind that it’s an antipattern to use
SELECT *
in performance critical queries unless you know for certain you need every column in the table.It is possible to turn the query into O(1). But it requires that the start-end ranges be non-overlapping. And, if you are willing to have extra rows for the gaps, you can (and should) get rid of one of the columns.
The query will become
If you might get nothing then that is solved one of these ways:
If you keep
endnum
, then verify that the returnedendnum
is< 19
.Check after the
LIMIT
(can’t be done before without messy syntax like this):Eliminate column
endnum
. In this case, it should be obvious whether you got a valid row or a "gap" row. (This assumes you keepstartnum
; a similar query could be written in you prefer to keep onlyendnum
.) Note that all possible startnum values are represented in some row by being >= the current startnum and < the next startnum.I discuss this in my blog on how to deal with IP address ranges: http://mysql.rjweb.org/doc.php/ipranges