skip to Main Content

I have a single table with ~ 1.5 mio entries and this query:

SELECT id 
FROM abc.items as I 
where revisit < '2022-11-17T00:00:00Z' 
AND type = 10 
AND firstdelivery = (
                        select min(firstdelivery) 
                        from abc.items 
                        WHERE I.polizzennummer = polizzennummer
                    ) 
LIMIT 50

Which takes about 120-180 seconds to run.
The revisit is the part which changes the most, this is a query which runs every day.

Table schema:

'id','int(10) unsigned','NO','PRI',NULL,'auto_increment'
'polizzennummer','varchar(64)','YES','MUL',NULL,''
'type','tinyint(4)','YES','MUL',NULL,''
'revisit','datetime','YES','MUL',NULL,''
'status','tinyint(4)','YES','',NULL,''
'ts','datetime','YES','',NULL,''
'grund','text','YES','',NULL,''
'firstdelivery','datetime','YES','MUL',NULL,''

Explain Output:

'1', 'PRIMARY', 'I', NULL, 'ref', 'idx_abc_type,idx_abc_revisit,idx_abc_type_revisit_firstdelivery', 'idx_abc_type', '2', 'const', '499036', '50.00', 'Using where'
'2', 'DEPENDENT SUBQUERY', 'abc', NULL, 'ref', 'polizzennummer,idx_abc_polizzennummer_firstdelivery', 'idx_abc_polizzennummer_firstdelivery', '195', 'abc.I.polizzennummer', '7', '100.00', 'Using index'

The table index output:

'abc', '0', 'PRIMARY', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '0', 'id_UNIQUE', '1', 'id', 'A', '998072', NULL, NULL, '', 'BTREE', '', ''
'abc', '1', 'polizzennummer', '1', 'polizzennummer', 'A', '117288', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_revisit', '1', 'revisit', 'A', '93553', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_firstdelivery', '1', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '1', 'polizzennummer', 'A', '125237', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_polizzennummer_firstdelivery', '2', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '1', 'type', 'A', '10', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '2', 'revisit', 'A', '116078', NULL, NULL, 'YES', 'BTREE', '', ''
'abc', '1', 'idx_abc_type_revisit_firstdelivery', '3', 'firstdelivery', 'A', '998072', NULL, NULL, 'YES', 'BTREE', '', ''

Im not sure how i could optimize this query further.

Any help or suggestion is appreciated

3

Answers


  1. I can’t really figure out your indices from what you included in the question. Here are the indices I would suggest for this query. For the correlated subquery, use:

    CREATE INDEX idx1 ON table1.items (polizzennummer, firstdelivery);
    

    This index should allow for rapid evaluation given each row in the outer query. For the outer query itself, you may try:

    CREATE INDEX idx2 ON abc.items (type, revisit, firstdelivery);
    

    This covers the WHERE clause of the outer query.

    Note that your current query is using LIMIT without ORDER BY, which does not make much sense. You should add an ORDER BY clause to the outer query.

    Login or Signup to reply.
  2. You want the first delivery and only show it if its type is 10 and its revisit date is before 2022-11-17. But in order to do this you select all candidates, and for each you select all rows with the same polizzennummer. This can mean that you run the same subquery again and again.

    Three approaches to help here:

    Idea #1: Get the minimum firstdelivery once per polizzennummer

    SELECT id 
    FROM abc.items
    WHERE revisit < '2022-11-17T00:00:00Z' 
    AND type = 10
    AND (polizzennummer, firstdelivery) IN
    (
      SELECT polizzennummer, MIN(firstdelivery)
      FROM abc.items
      GROUP BY polizzennummer
    );
    

    Idea #2: Stop looking when you find an earlier firstdelivery

    SELECT i.id 
    FROM abc.items i 
    WHERE i.revisit < '2022-11-17T00:00:00Z' 
    AND i.type = 10
    AND NOT EXISTS
    (
      SELECT null
      FROM abc.items i2
      WHERE i2.polizzennummer = i.polizzennummer
      AND i2.firstdelivery < i.firstdelivery
    );
    

    Idea #3: Find the minimum firstdelivery for each row and continue working on that dataset

    SELECT *
    FROM
    (
      SELECT i.*, MIN(firstdelivery) OVER (PARTITION BY polizzennummer) AS min_firstdelivery
      FROM abc.items i 
    ) with_min_fd
    WHERE firstdelivery = min_firstdelivery
    AND revisit < '2022-11-17T00:00:00Z' 
    AND type = 10;
    

    As to indexes: Your index idx_abc_polizzennummer_firstdelivery on abc.items(polizzennummer, firstdelivery) should suffice for above queries.

    Login or Signup to reply.
  3. A LIMIT without an ORDER BY is somewhat meaningless — which 50 rows do you want?

    A "dependent subquery" in a WHERE clause is quite costly, so…

    Try this reformulation:

    SELECT  I.id
        FROM  
        (
            SELECT  polizzennummer, min(firstdelivery)
                from  abc.items
                WHERE type = 10    -- ??
                GROUP BY  polizzennummer 
        ) AS mins
        JOIN  abc.items as I  ON I.firstdelivery = mins.polizzennummer
        where  I.revisit <  '2022-11-17T00:00:00Z'
          AND  I.type = 10
        ORDER BY  ???
        LIMIT  50
    

    after adding these composite indexes:

    INDEX(type, firstdelivery, revisit,  id)
    INDEX(polizzennummer, firstdelivery)
    INDEX(polizzennummer, type, firstdelivery)
    

    I am unclear whether the derived table needs to limit the tests to type=10. If it is needed, the third index is needed.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search