skip to Main Content

I have a mySQL database with several tables. But one tabel contains af 1,400,000 rows. I need to get the 50,000 rows with the highest value in one field. A field that counts visitors.

SELECT uid, title, number, views 
FROM ct 
WHERE hidden = 0 
AND deleted = 0 
AND number > 0 
AND status LIKE 'active' 
order by views desc 
limit 0, 50000

It is extremely slow. The database is InnoDB and title, number and views are all indexed.

How can I speed up the selection?

From EXPLAIN:

1  SIMPLE  ct  ALL  number_index  NULL  NULL  NULL  1465440 Using where; Using filesort

After indexing Status
From EXPLAIN:

1  SIMPLE  ct  range  number_index,status  status  302  NULL  732720 Using index condition; Using where; Using filesort

3

Answers


  1. How many rows do you have matching

    WHERE hidden = 0 
    AND deleted = 0 
    AND number > 0 
    AND status LIKE 'active' 
    

    ?

    If the answer is more than 70000 or so, then the short answer is that there is nothing you can do to speed things up. If it is significantly less then you will get some improvement with an index on hidden, deleted, number and status – how much of a speed up depends on the order of these attributes in the index and the cardinality for each of the attributes (nint: you want your highest cardinality entries first).

    Login or Signup to reply.
  2. This composite index may speed it up:

    INDEX(hidden, deleted, status, number, views)
    

    The real problem is shoveling 50K rows to the client. What the heck will the client do with that many rows?

    However, "the 50,000 rows with the highest value in one field" is not what your query finds. Perhaps you first need to find "the highest value" in that field. Then search for all rows with that value in that field??

    Login or Signup to reply.
  3. Showing 50k rows is always going to be costly. Just transferring the result set over the network is going to take a while. So there’s a limit to how much you can "optimize" the query if the result set is that large. I’d seriously reconsider a design that required a result of 50k rows.

    As for the best index for this query, the usual rule applies: use a compound index, consisting of the columns in equality conditions first, then ONE column used in range conditions. In your case, I would suggest:

    alter table cte add index (hidden, deleted, status, number)
    

    The first three may be in any order, since they’re all equality conditions. Then number because it’s a range condition. There’s no way to optimize the ORDER BY, because the range condition spoils that.

    A comment asked about partitioning or other methods of optimizing. I don’t think it’s likely that partitioning pruning will help.

    The other method of optimizing I’d use is archiving. How many of the 1.4 million rows are hidden or deleted? Why not move those rows to another table, or to cold storage, or simply delete them. That would keep the active table smaller, and easier to keep in the buffer pool.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search