skip to Main Content

I have a SQL query when joining several large tables. My database consists of millions of rows across the users, orders and transactions table. Here is the query

    SELECT u.name, COUNT(o.id) AS order_count
    FROM users u
    JOIN orders o ON u.id = o.user_id
    JOIN transactions t ON o.id = t.order_id
    WHERE t.status = 'completed'
    GROUP BY u.name
    HAVING order_count > 10;

I ensured that there are indexes on all the necessary fields. users.id, orders.user_id and transactions.order_id. However the query becomes significantly slower as the dataset size increases.

The order table is approximately 5 million rows.
The transactions table is approximately 10 million rows.
The database is running on MySQL 8.

What optimization techniques or query restructuring could help improve performance in this case?

2

Answers


  1. This delays the name lookup until after the GROUP BY

    SELECT  u.name
            ot.order_count
        FROM (
            SELECT  o.user_id,
                    COUNT(*) AS order_count
              FROM orders o
              JOIN  transactions t  ON o.id = t.order_id
              WHERE  t.status = 'completed'
              GROUP BY  o.user_id
              HAVING  order_count > 10 ) AS ot
        LEFT JOIN users u  ON u.id = ot.user_id;
    

    This is ‘composite’ and ‘covering’ and useful for the transaction lookup. And the columns are in the optimal order:

    INDEX(status, order_id)
    

    If these suggestions do not help enough, please provide SHOW CREATE TABLE for both tables and EXPLAIN SELECT ....

    Login or Signup to reply.
  2. Try to create thoses indexes, they are covering…

    CREATE INDEX X001 ON transactions (status, order_id);
    
    CREATE INDEX X002 ON orders (user_id, id);
    
    CREATE INDEX X003 ON users (id, name);
    

    Single columns indexes are good if the query returns few rows without any aggregation.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search