skip to Main Content

I’m facing a performance issue with a complex SQL query in PostgreSQL and seeking your expertise to help optimize it. The query involves multiple tables and joins and seems to be taking longer to execute than expected. I believe there might be room for improvement to enhance its efficiency and speed up the results retrieval process.
Details:
Database: PostgreSQL
Tables involved: customers, orders, order_items, and products
Use case: This query aims to retrieve data about customers in the USA who placed orders for electronics products after January 1, 2022.

SELECT customers.customer_id, customers.name, products.product_id, products.product_name, 
       orders.order_id, orders.order_date, order_items.quantity, order_items.unit_price
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
JOIN order_items ON orders.order_id = order_items.order_id
JOIN products ON order_items.product_id = products.product_id
WHERE customers.country = 'USA'
AND orders.order_date >= '2022-01-01'
AND products.category = 'Electronics'
ORDER BY orders.order_date DESC;

I would greatly appreciate any insights or recommendations on how to optimize this query. My goal is to efficiently retrieve the required data while minimizing resource usage and query execution time. I’m particularly interested in any index suggestions, join optimizations, or other SQL techniques that could improve the overall performance.

3

Answers


  1. The DBMS will try to find a way to accumulate the data without building huge intermediate results. It may be looking out for the most limiting factor in your query. If you mainly have customers in Europe, customers.country = 'USA' may be the most limiting clause. The DBMS would pick those relatively few customers, look up their orders, etc. Or you may have orders since 1995 in your database. Then orders.order_date >= date '2022-01-01' would be a great way to start gathering the data. Get the relatively few orders since 2022, get their electronic items, and the US customers and be done with it. Or there may be 500 different categories, with electronics just being one of them. Or maybe none of the clauses alone is very helpful to reduce the accessed data significantly.

    I don’t know your data, and I don’t know how well your DBMS knows your data. Here are some indexes you could try:

    create index idx1 on customers (country) include customer_id, name;
    create index idx2 on orders (customer_id, order_date) include order_id;
    create index idx3 on order_items (order_id) include product_id, quantity, unit_price;
    create index idx4 on products (product_id, category) include product_name;
    
    create index idx5 on orders (order_date) include customer_id, order_id;
    create index idx6 on customers (customer_id) include country, name;
    
    create index idx7 on products (category) include product_id, product_name;
    create index idx8 on order_items (product_id) include order_id, quantity, unit_price;
    create index idx9 on orders (order_id, order_date) include customer_id;
    

    Create these. Then check which indexes are being used and drop the unused ones.

    Login or Signup to reply.
  2. If you don’t need all columns from each table, consider selecting only the columns you require.

    This reduces the amount of data that needs to be processed

    Login or Signup to reply.
  3. Create Missing INDEX:

    CREATE INDEX idx_customers_country ON customers (country);
    CREATE INDEX idx_orders_order_date ON orders (order_date);
    CREATE INDEX idx_products_category ON products (category);
    CREATE INDEX idx_order_items_order_id ON order_items (order_id);
    CREATE INDEX idx_order_items_product_id ON order_items (product_id);
    

    Optimized Query:

    WITH relevant_orders AS (
        SELECT order_id
        FROM orders
        WHERE order_date >= '2022-01-01'
    )
    SELECT c.customer_id, c.name, p.product_id, p.product_name, 
           o.order_id, o.order_date, oi.quantity, oi.unit_price
    FROM customers c
    JOIN relevant_orders ro ON c.customer_id = ro.customer_id
    JOIN order_items oi ON ro.order_id = oi.order_id
    JOIN products p ON oi.product_id = p.product_id
    WHERE c.country = 'USA'
    AND p.category = 'Electronics'
    ORDER BY o.order_date DESC;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search