skip to Main Content

I have the following query:

SELECT *
FROM customers_cards
WHERE id NOT IN ( SELECT card_id FROM customers_card_psp)

It takes about 2sec to get executed and we need this query to get called on a highly visited page. Any idea how can I make it more optimize?

Here is the result of EXPLAIN:

enter image description here

Noted that card_id is not unique (there are 5 rows inside customers_card_psp per 1 row inside customers_cards), but adding DISTINCT in the behind of card_id doesn’t make anything better.


Noted that, using LEFT JOIN has the same performance:

SELECT cc.*
FROM customers_cards cc
LEFT JOIN customers_card_psp ccp ON ccp.card_id = cc.id
WHERE ccp.id IS null

enter image description here

2

Answers


  1. It takes about 2sec to get executed and we need this query to get called on a highly visited page.

    As presented, this query needs to read EVERY SINGLE ROW in customer_cards. Hence either:

    1. Your "highly visited page" is displaying hundreds of thousands of rows

    2. there are other filtering predicates you’ve not told us about

    3. most of the records in customer_cards are excluded from the result set on the grounds of having corresponding records in customer_car_psp

    For scenario 1 this is bad UX design – which you can’t fix in your database. But if really is the right way to solve the problem, consider caching outside the DBMS.

    For scenario 2 we can’t help you tune a query you have not shown us (nor have you included the schema)

    For scenario 3 you need to redesign your table structure

    Login or Signup to reply.
  2. These are 3 ways to say the same thing:

    SELECT *
        FROM customers_cards
        WHERE id NOT IN (
            SELECT card_id FROM customers_card_psp )
    

    vs

    SELECT *
        FROM customers_cards AS cc
        WHERE NOT EXISTS (
            SELECT 1               -- anything works here; "1" is the convention
                FROM customers_card_psp
                WHERE card_id = cc.id )
    

    vs

    SELECT cc.*
        FROM customers_cards AS cc
        LEFT JOIN customers_card_psp AS psp
               ON psp.card_id = cc.id       -- How the tables 'relate'
        WHERE psp.card_id IS NULL           -- Means "missing from psp"
    

    The NOT IN ( SELECT ... ) is notoriously inefficient and should be avoided.

    The EXISTS and LEFT JOIN are probably compiled identically, hence equally good.

    The EXISTS is called "semi-join" meaning that it only needs to find 0 or 1 row, not all 5, in order to satisfy the test.

    Related: The comparisons are probably the same without the NOT (and changing to IS NULL). That is, do not use IN; use either of the others.

    (Different versions of MySQL/MariaDB have optimized these things differently.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search