skip to Main Content

I am wondering if the following MySql SELECT query takes O(N) or O(logN).

Let us we have a table represents 4 integer intervals [startNum, endNum].
And the table is indexed by the startNum and endNum columns.

startNum, endNum
3, 8
10, 15
16, 21
28, 42

Query:

SELECT * from table
where startNum <= 19 AND endNum >= 19

I think MySql will take O(N), because it will

 1. find the first 3 rows using the "startNum"; then 
 2. go through each of them and use the "endNum" to identify the 3rd row; then 
 3. return the 3rd row [16, 21] as the result.

Is MySql "smart" enough to do the following?

1. binary search on the startNum to find the position of the 3rd row, since "startNum" is sorted; then
2. binary search on the endNum to find the 3rd row again, since "endNum" is also sorted; then
3. return the 3rd row [16, 21] as the result.

From this documentation: https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html

If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the
optimizer uses it but considers no more key parts.

I don’t think MySql is doing the "smart" binary searches.

Am I right?
Is there any configuration that can enable MySql do the binary searches?

2

Answers


  1. You are correct. It’s O(n). If you have indexes on both startNum and endNum, the query planner will choose one index. Based on table statistics it will try to choose the index that is more selective.

    Then it will random-access that index to the first eligible row and proceed to scan the rest of the table to satisfy the other inequality predicate. This is in the nature of BTREE indexing. The situation is the same in every make of table server that uses BTREE indexing, not just MySql / Mariadb.

    If your indexes are compound indexes, like this

    ALTER TABLE `table`
      ADD INDEX start_end (startNum, endNum),
      ADD INDEX end_start (endNum, startNum);
    

    the query planner will probably choose to scan the index instead of the whole table. That’s usually faster, but still O(n).

    Keep in mind that it’s an antipattern to use SELECT * in performance critical queries unless you know for certain you need every column in the table.

    Login or Signup to reply.
  2. It is possible to turn the query into O(1). But it requires that the start-end ranges be non-overlapping. And, if you are willing to have extra rows for the gaps, you can (and should) get rid of one of the columns.

    The query will become

    SELECT *
        FROM table
        WHERE startnum >= 19
        ORDER BY startnum
        LIMIT 1;
    

    If you might get nothing then that is solved one of these ways:

    • If you keep endnum, then verify that the returned endnum is < 19.

    • Check after the LIMIT (can’t be done before without messy syntax like this):

        SELECT *
            FROM ( the above query ) AS x
            WHERE endnum <= 19;
      
    • Eliminate column endnum. In this case, it should be obvious whether you got a valid row or a "gap" row. (This assumes you keep startnum; a similar query could be written in you prefer to keep only endnum.) Note that all possible startnum values are represented in some row by being >= the current startnum and < the next startnum.

    I discuss this in my blog on how to deal with IP address ranges: http://mysql.rjweb.org/doc.php/ipranges

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search