Optimize MySQL Query - single table with SubQuery

user1966723
November 17, 2022
299 views
3 votes
3 Answers

I have a single table with ~ 1.5 mio entries and this query:

SELECT id 
FROM abc.items as I 
where revisit < '2022-11-17T00:00:00Z' 
AND type = 10 
AND firstdelivery = (
                        select min(firstdelivery) 
                        from abc.items 
                        WHERE I.polizzennummer = polizzennummer
                    ) 
LIMIT 50

Which takes about 120-180 seconds to run.
The revisit is the part which changes the most, this is a query which runs every day.

Table schema:

'id','int(10) unsigned','NO','PRI',NULL,'auto_increment'
'polizzennummer','varchar(64)','YES','MUL',NULL,''
'type','tinyint(4)','YES','MUL',NULL,''
'revisit','datetime','YES','MUL',NULL,''
'status','tinyint(4)','YES','',NULL,''
'ts','datetime','YES','',NULL,''
'grund','text','YES','',NULL,''
'firstdelivery','datetime','YES','MUL',NULL,''

Explain Output:

'1', 'PRIMARY', 'I', NULL, 'ref', 'idx_abc_type,idx_abc_revisit,idx_abc_type_revisit_firstdelivery', 'idx_abc_type', '2', 'const', '499036', '50.00', 'Using where'
'2', 'DEPENDENT SUBQUERY', 'abc', NULL, 'ref', 'polizzennummer,idx_abc_polizzennummer_firstdelivery', 'idx_abc_polizzennummer_firstdelivery', '195', 'abc.I.polizzennummer', '7', '100.00', 'Using index'

The table index output:

'abc', '0', 'PRIMARY', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '0', 'id_UNIQUE', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '1', 'polizzennummer', '1', 'polizzennummer', 'A', '117288', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_revisit', '1', 'revisit', 'A', '93553', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_firstdelivery', '1', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '1', 'polizzennummer', 'A', '125237', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '2', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '2', 'revisit', 'A', '116078', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '3', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''

Im not sure how i could optimize this query further.

Any help or suggestion is appreciated

Answers

- TimBiegeleisen
- November 17, 2022 at 3:28 pm
- 0 votes
0
I can’t really figure out your indices from what you included in the question. Here are the indices I would suggest for this query. For the correlated subquery, use:
```
CREATE INDEX idx1 ON table1.items (polizzennummer, firstdelivery);
```
This index should allow for rapid evaluation given each row in the outer query. For the outer query itself, you may try:
```
CREATE INDEX idx2 ON abc.items (type, revisit, firstdelivery);
```
This covers the WHERE clause of the outer query.

Note that your current query is using LIMIT without ORDER BY, which does not make much sense. You should add an ORDER BY clause to the outer query.
Login or Signup to reply.

- ThorstenKettner
- November 17, 2022 at 4:17 pm
- 0 votes
0
You want the first delivery and only show it if its type is 10 and its revisit date is before 2022-11-17. But in order to do this you select all candidates, and for each you select all rows with the same polizzennummer. This can mean that you run the same subquery again and again.

Three approaches to help here:

Idea #1: Get the minimum firstdelivery once per polizzennummer
```
SELECT id 
FROM abc.items
WHERE revisit < '2022-11-17T00:00:00Z' 
AND type = 10
AND (polizzennummer, firstdelivery) IN
(
  SELECT polizzennummer, MIN(firstdelivery)
  FROM abc.items
  GROUP BY polizzennummer
);
```
Idea #2: Stop looking when you find an earlier firstdelivery
```
SELECT i.id 
FROM abc.items i 
WHERE i.revisit < '2022-11-17T00:00:00Z' 
AND i.type = 10
AND NOT EXISTS
(
  SELECT null
  FROM abc.items i2
  WHERE i2.polizzennummer = i.polizzennummer
  AND i2.firstdelivery < i.firstdelivery
);
```
Idea #3: Find the minimum firstdelivery for each row and continue working on that dataset
```
SELECT *
FROM
(
  SELECT i.*, MIN(firstdelivery) OVER (PARTITION BY polizzennummer) AS min_firstdelivery
  FROM abc.items i 
) with_min_fd
WHERE firstdelivery = min_firstdelivery
AND revisit < '2022-11-17T00:00:00Z' 
AND type = 10;
```
As to indexes: Your index idx_abc_polizzennummer_firstdelivery on abc.items(polizzennummer, firstdelivery) should suffice for above queries.
Login or Signup to reply.

- RickJames
- November 18, 2022 at 3:01 am
- 0 votes
0
A LIMIT without an ORDER BY is somewhat meaningless — which 50 rows do you want?

A "dependent subquery" in a WHERE clause is quite costly, so…

Try this reformulation:
```
SELECT  I.id
    FROM  
    (
        SELECT  polizzennummer, min(firstdelivery)
            from  abc.items
            WHERE type = 10    -- ??
            GROUP BY  polizzennummer 
    ) AS mins
    JOIN  abc.items as I  ON I.firstdelivery = mins.polizzennummer
    where  I.revisit <  '2022-11-17T00:00:00Z'
      AND  I.type = 10
    ORDER BY  ???
    LIMIT  50
```
after adding these composite indexes:
```
INDEX(type, firstdelivery, revisit,  id)
INDEX(polizzennummer, firstdelivery)
INDEX(polizzennummer, type, firstdelivery)
```
I am unclear whether the derived table needs to limit the tests to type=10. If it is needed, the third index is needed.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Optimize MySQL Query – single table with SubQuery

Answers