skip to Main Content

Ex.

SELECT *
FROM A
JOIN B ON A.idx = B.idx
JOIN C ON A.idx = C.idx
WHERE A.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR A.last_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR B.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR B.last_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR C.create_dt BETWEEN '2023-05-01' AND '2023-05-31'
  OR C.last_dt BETWEEN '2023-05-01' AND '2023-05-31';

It is a PostgreSQL DB

While multi-joining the tables, I want to SELECT only the data where the create_dt or last_dt of each table is the latest.
However, there is a problem that the speed is too slow. How to solve in this case?

Performance is fine when I leave only the A table WHERE condition.

2

Answers


  1. You might try adding the following indices to the three tables:

    CREATE INDEX idx_a ON A (idx, create_dt, last_dt);
    CREATE INDEX idx_b ON B (idx, create_dt, last_dt);
    CREATE INDEX idx_c ON C (idx, create_dt, last_dt);
    

    These indices, if used, should speed up the joins in your query.

    Login or Signup to reply.
  2. I want to SELECT only the data where the create_dt or last_dt of each table is the latest.

    Then you certainly cannot join on idx since the latest per table won’t share the same idx value.

    Your query does not do what you say. This one does:

    SELECT *
    FROM  (
       (
       SELECT GREATEST(create_dt, last_dt) AS latest_a, *
       FROM   a
       WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
       ORDER  BY create_dt DESC
       LIMIT 1
       )
       UNION ALL
       (
       SELECT GREATEST(create_dt, last_dt) AS latest_a, *
       FROM   a
       WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
       ORDER  BY last_dt DESC
       LIMIT 1
       )
       ORDER BY latest_a
       LIMIT 1
       ) a
    CROSS JOIN  (
       (
       SELECT GREATEST(create_dt, last_dt) AS latest_b, *
       FROM   b
       WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
       ORDER  BY create_dt DESC
       LIMIT 1
       )
       UNION ALL
       (
       SELECT GREATEST(create_dt, last_dt) AS latest_b, *
       FROM   b
       WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
       ORDER  BY last_dt DESC
       LIMIT 1
       )
       ORDER BY latest_b
       LIMIT 1
       ) b
    CROSS JOIN  (
       (
       SELECT GREATEST(create_dt, last_dt) AS latest_c, *
       FROM   a
       WHERE  create_dt BETWEEN '2023-05-01' AND '2023-05-31'
       ORDER  BY create_dt DESC
       LIMIT 1
       )
       UNION ALL
       (
       SELECT GREATEST(create_dt, last_dt) AS latest_c, *
       FROM   a
       WHERE  last_dt BETWEEN '2023-05-01' AND '2023-05-31'
       ORDER  BY last_dt DESC
       LIMIT 1
       )
       ORDER BY latest_c
       LIMIT 1
       ) c -- USING (idx);
    

    All parentheses required.
    A bit verbose. But it’s as fast as this gets – provided you have these indexes:

    CREATE INDEX a_create_dt_idx ON A (create_dt);
    CREATE INDEX a_last_dt_idx ON A (last_dt);
    
    CREATE INDEX b_create_dt_idx ON B (create_dt);
    CREATE INDEX b_last_dt_idx ON B (last_dt);
    
    CREATE INDEX c_create_dt_idx ON C (create_dt);
    CREATE INDEX c_last_dt_idx ON C (last_dt);
    

    It will be two index seeks per table, directly picking the one qualifying row every time.

    I am joining with an unconditional CROSS JOIN, since each subquery returns exactly one row, provided at least one qualifies.

    If one of the subqueries finds no row, the result is empty. Maybe you really want a FULL OUTER JOIN to preserve results from the other tables if one comes up empty. Or just 3 result rows.

    Then again, I wouldn’t be surprised if you didn’t exactly say what you really need.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search