If going to some very end of the pages, it will still fully scan all the previous records (on Disk).
One of the solution is to get the ID of the last previous record and then use where to filter (where id > previous ID Limit 0, 20 ). But if the SQL is including group up and sum, this method will not be work as the number of sum will be wrong.
Does anyone have any solution?
I know that if you want better performance, you won’t use mysql. But I still like to know is there a method to optimize it.
For Example:
SELECT
productID,
locationID,
SUM(qty) AS total_qty
FROM
inventory
WHERE
create_date BETWEEN 'yyyy-mm-dd' AND 'yyyy-mm-dd'
GROUP BY
productID,
locationID
ORDER BY
productID,
locationID;
limit 100, 20;
Assuming the inventory log table has the following sample data:
id productID locationID qty create_date
1 101 1 10 2024-01-01 10:00:00
2 101 1 -2 2024-01-02 12:00:00
3 102 1 5 2024-01-03 14:00:00
4 101 2 20 2024-01-01 09:00:00
5 101 1 3 2024-01-04 16:00:00
6 102 1 -1 2024-01-05 18:00:00
With the given sql, the result will be:
SELECT
productID,
locationID,
SUM(qty) AS total_qty
FROM
inventory
WHERE
create_date BETWEEN '2024-01-01' AND '2024-01-04'
GROUP BY
productID,
locationID
ORDER BY
productID,
locationID;
Result:
productID locationID total_qty
101 1 11
101 2 20
102 1 5
But there will not only 3 records, so I would like to do pagination with limit. But I found that with (limit), it will still scan the all records before the offset on the disk. And it seem not possible to filter by keyset pagination as the id is not sequential and related.
2
Answers
You may consider adding the following covering index:
This index, if used, should let MySQL discard any records with out of range
create_date
values. Post that, MySQL can scan the entire index and aggregate by the 2 columns in theGROUP BY
clause. The index contains these 2 aggregate columns as well asqty
.As @Thorsten has pointed out in his comment below, your current query is invalid (at least with
ONLY_FULL_GROUP_BY
mode enabled), as it includescreate_date
in the select clause while omitting it from theGROUP BY
clause. Here is one possible corrected version:OFFSET
is the real problem. The processing has to skip over 100 rows before getting and delivering the desired 20. But because of theWHERE
andGROUP BY
, there is no easy way to avoid the overhead.I assume you have millions of rows, correct? (Otherwise, I would consider the performance problem minimal.)
I recommend two techniques for speeding up this query — one for shrinking the table and one for avoiding
OFFSET
.What are the date ranges like? Weekly? Daily? Arbitrary? How many rows rows per week/day/whatever? If at least 10 rows (average) per unit date range unit, building and maintaining a Summary Table .
To get rid of
OFFSET
(and its flaws), do pagination via "remember where you left off": Pagination