I have a table named "article" containing 1,145,141 records with various fields including
"id," "uiqPID," "tenant," "status," "title," "body," "user_id," "category_id," "created_at," and "updated_at."
The "status" column can have one of three values: ‘PUBLISHED,’ ‘DRAFT,’ or ‘ARCHIVED,’ with the following counts:
- PUBLISHED: 2
- DRAFT: 26,145
- ARCHIVED: 1,118,993
I have set up the following indexes:
- PRIMARY index on ‘id’
- article_abstract_unq_id index on ‘uiqPID’
- article_abstract_unq_id index on ‘tenant’
- article_status_idx index on ‘status’
- idx_composite_search index on ‘id’
- idx_composite_search index on ‘uiqPID’
- idx_composite_search index on ‘created_at’
My issue is with the performance of the following query, which took 5.7 seconds to execute:
SELECT
a.id AS id,
a.created_at AS created_at
FROM
article a
WHERE
a.status = 'ARCHIVED'
ORDER BY a.created_at DESC
LIMIT 50;
However, if I remove the WHERE condition or change it to a.status = ‘DRAFT’, the query completes within 1 second.
Upon inspecting the query plan, I noticed the difference in execution strategies. With the ‘ARCHIVED’ or ‘DRAFT’ status filter, the plan shows:
key: article_status_idx
Extra: Using index condition; Using filesort
But without the ‘ARCHIVED’ filter, the plan simply states:
key:
Extra: Using filesort
My question is: How can I optimize the query performance for filtering on ‘ARCHIVED’ status, ensuring it executes faster than the current 5.7 seconds, similar to the queries without this condition or with ‘DRAFT’ status?
3
Answers
The behavior you experienced is actually what I would expect. writing 1,118,993 number-timestamp tuples into memory takes time, especially if you order the results by
created_at
. You need to test the following:and if the time to execute this query is roughly the same as executing the query with the where clause, then it’s not the where clause that causes the performance drop, but ordering the query + loading the results into memory. Run the following as well:
Here we do not even have an order by. If this is similarly slow, then the thing you mainly wait for is to write this stuff into memory.
Anyway, do these measurements and the result will either confirm that the slowness is caused by the where or order by clause, or not. If these clauses cause the slowness, then let me know in the comment section and I will provide ways to optimize that issue. But it’s highly probable that the real problem is that you are waiting for the full query to be executed.
Maybe it’s better to break the query into partitions and run queries with limit 0, 10000 and then limit 10000, 10000 and so on, so you can work with the first results already while waiting for the others, maybe reducing the frustration of impatient users. But that does not change the fact that loading all this stuff takes time, even though you can make this waiting user-friendlier if the showing of partial results makes some useful sense.
EDIT
If all else fails, you could create a table like this:
And then you can select from
article_archive
, like:You can even join that with
article
byid
.Of course, you can try creating a (status, created_at) key before you create a copy table.
Here’s an alternative approach that may improve speed, try it:
Explanation:
This query utilizes a subquery to first identify the id values of the 50 most recent ‘ARCHIVED’ records based on the created_at column. It then performs a join to retrieve the additional columns (id and created_at) from the main table article. This approach can be beneficial when you want to optimize queries with complex filtering and sorting.
Please ensure that the article_status_idx index on the status column and the idx_composite_search index on the created_at and id columns are well-maintained for efficient execution.
If you go with this advice, don’t forget my score, go and enjoy yourself.
You can create a composite index matching the query exactly:
Thus the DBMS can go to status = ‘ARCHIVED’ in the index, read the first 50 entries and be done with the task.
https://dev.mysql.com/doc/refman/8.0/en/descending-indexes.html