skip to Main Content

I have a 40M record table having id as the primary key. I execute a select statement as follows:

select * from messages where (some condition) order by id desc limit 20;

It is ok and the query executes in a reasonable time. But when I add an always valid condition as follows, It takes a huge time.

select * from messages where id > 0 and (some condition) order by id desc limit 20;

I guess it is a bug and makes MySQL search from the top side of the table instead of the bottom side. If there is any other justification or optimization it would be great a help.

p.s. with a high probability, the results are found in the last 10% records of my table.

p.p.s. the some condition is like where col1 = x1 and col2 = x2 where col1 and col2 are indexed.

3

Answers


  1. It’s not a bug. You are searching through a 40 million table where your where clause doesn’t have an index. Add an index on the column in your where clause and you will see substantial improvement.

    Login or Signup to reply.
  2. MySQL has to choose whether to use an index to process the WHERE clause, or use an index to control ORDER BY ... LIMIT ....

    In the first query, the WHERE clause can’t make effective use of an index, so it prefers to use the primary key index to optimize scanning in order by ORDER BY. In this case it stops when it finds 20 results that satisfy the WHERE condition.

    In the second query, the id > 0 condition in the WHERE clause can make use of the index, so it prefers to use that instead of using the index for ORDER BY. In this case, it has to find all the results that match the WHERE condition, and then sort them by id.

    I wouldn’t really call this a bug, as there’s no specification of precisely how a query should be optimized. It’s not always easy for the query planner to determine the best way to make use of indexes. Using the index to filter the rows using WHERE id > x could be better if there aren’t many rows that match that condition.

    Login or Signup to reply.
  3. A query like this

    select  *
        from  messages
        where  col1 = x1
          and  col2 = x2
        order by  id desc
        limit  20;
    

    is best handled by a ‘composite’ index with the tests for ‘=’ first:

    INDEX(col1, col2, id)
    

    Try it, I think it will be faster than either of the queries you are working with.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search