I have a single table with ~ 1.5 mio entries and this query:
SELECT id
FROM abc.items as I
where revisit < '2022-11-17T00:00:00Z'
AND type = 10
AND firstdelivery = (
select min(firstdelivery)
from abc.items
WHERE I.polizzennummer = polizzennummer
)
LIMIT 50
Which takes about 120-180 seconds to run.
The revisit is the part which changes the most, this is a query which runs every day.
Table schema:
'id','int(10) unsigned','NO','PRI',NULL,'auto_increment'
'polizzennummer','varchar(64)','YES','MUL',NULL,''
'type','tinyint(4)','YES','MUL',NULL,''
'revisit','datetime','YES','MUL',NULL,''
'status','tinyint(4)','YES','',NULL,''
'ts','datetime','YES','',NULL,''
'grund','text','YES','',NULL,''
'firstdelivery','datetime','YES','MUL',NULL,''
Explain Output:
'1', 'PRIMARY', 'I', NULL, 'ref', 'idx_abc_type,idx_abc_revisit,idx_abc_type_revisit_firstdelivery', 'idx_abc_type', '2', 'const', '499036', '50.00', 'Using where'
'2', 'DEPENDENT SUBQUERY', 'abc', NULL, 'ref', 'polizzennummer,idx_abc_polizzennummer_firstdelivery', 'idx_abc_polizzennummer_firstdelivery', '195', 'abc.I.polizzennummer', '7', '100.00', 'Using index'
The table index output:
'abc', '0', 'PRIMARY', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '0', 'id_UNIQUE', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '1', 'polizzennummer', '1', 'polizzennummer', 'A', '117288', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_revisit', '1', 'revisit', 'A', '93553', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_firstdelivery', '1', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '1', 'polizzennummer', 'A', '125237', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '2', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '2', 'revisit', 'A', '116078', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '3', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
Im not sure how i could optimize this query further.
Any help or suggestion is appreciated
3
Answers
I can’t really figure out your indices from what you included in the question. Here are the indices I would suggest for this query. For the correlated subquery, use:
This index should allow for rapid evaluation given each row in the outer query. For the outer query itself, you may try:
This covers the
WHERE
clause of the outer query.Note that your current query is using
LIMIT
withoutORDER BY
, which does not make much sense. You should add anORDER BY
clause to the outer query.You want the first delivery and only show it if its type is 10 and its revisit date is before 2022-11-17. But in order to do this you select all candidates, and for each you select all rows with the same polizzennummer. This can mean that you run the same subquery again and again.
Three approaches to help here:
Idea #1: Get the minimum firstdelivery once per polizzennummer
Idea #2: Stop looking when you find an earlier firstdelivery
Idea #3: Find the minimum firstdelivery for each row and continue working on that dataset
As to indexes: Your index idx_abc_polizzennummer_firstdelivery on abc.items(polizzennummer, firstdelivery) should suffice for above queries.
A
LIMIT
without anORDER BY
is somewhat meaningless — which 50 rows do you want?A "dependent subquery" in a
WHERE
clause is quite costly, so…Try this reformulation:
after adding these composite indexes:
I am unclear whether the derived table needs to limit the tests to type=10. If it is needed, the third index is needed.