skip to Main Content

If going to some very end of the pages, it will still fully scan all the previous records (on Disk).
One of the solution is to get the ID of the last previous record and then use where to filter (where id > previous ID Limit 0, 20 ). But if the SQL is including group up and sum, this method will not be work as the number of sum will be wrong.
Does anyone have any solution?
I know that if you want better performance, you won’t use mysql. But I still like to know is there a method to optimize it.

For Example:

SELECT 
    productID, 
    locationID, 
    SUM(qty) AS total_qty
FROM 
    inventory
WHERE 
    create_date BETWEEN 'yyyy-mm-dd' AND 'yyyy-mm-dd'
GROUP BY 
    productID, 
    locationID
ORDER BY 
    productID, 
    locationID;
limit 100, 20;

Assuming the inventory log table has the following sample data:

id  productID   locationID  qty create_date
1   101 1   10  2024-01-01 10:00:00
2   101 1   -2  2024-01-02 12:00:00
3   102 1   5   2024-01-03 14:00:00
4   101 2   20  2024-01-01 09:00:00
5   101 1   3   2024-01-04 16:00:00
6   102 1   -1  2024-01-05 18:00:00

With the given sql, the result will be:

SELECT 
    productID, 
    locationID, 
    SUM(qty) AS total_qty
FROM 
    inventory
WHERE 
    create_date BETWEEN '2024-01-01' AND '2024-01-04'
GROUP BY 
    productID, 
    locationID
ORDER BY 
    productID, 
    locationID;

Result:
productID   locationID  total_qty
101 1   11
101 2   20
102 1   5

But there will not only 3 records, so I would like to do pagination with limit. But I found that with (limit), it will still scan the all records before the offset on the disk. And it seem not possible to filter by keyset pagination as the id is not sequential and related.

2

Answers


  1. You may consider adding the following covering index:

    CREATE INDEX idx2 ON inventory (create_date, productID, locationID, qty);
    

    This index, if used, should let MySQL discard any records with out of range create_date values. Post that, MySQL can scan the entire index and aggregate by the 2 columns in the GROUP BY clause. The index contains these 2 aggregate columns as well as qty.

    As @Thorsten has pointed out in his comment below, your current query is invalid (at least with ONLY_FULL_GROUP_BY mode enabled), as it includes create_date in the select clause while omitting it from the GROUP BY clause. Here is one possible corrected version:

    SELECT productID, locationID, SUM(qty)
    FROM inventory
    WHERE create_date BETWEEN 'yyyy-mm-dd' and 'yyyy-mm-dd'
    GROUP BY productID, locationID
    ORDER BY productID
    LIMIT 100, 20;
    
    Login or Signup to reply.
  2. OFFSET is the real problem. The processing has to skip over 100 rows before getting and delivering the desired 20. But because of the WHERE and GROUP BY, there is no easy way to avoid the overhead.

    I assume you have millions of rows, correct? (Otherwise, I would consider the performance problem minimal.)

    I recommend two techniques for speeding up this query — one for shrinking the table and one for avoiding OFFSET.

    What are the date ranges like? Weekly? Daily? Arbitrary? How many rows rows per week/day/whatever? If at least 10 rows (average) per unit date range unit, building and maintaining a Summary Table .

    To get rid of OFFSET (and its flaws), do pagination via "remember where you left off": Pagination

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search