skip to Main Content

The database used is PostgreSQL. I have a query that combines two tables and does a where search on some category and then does order by by some property. The table products has a 10+ million products. If I put a random name on category name that doesn’t exist the query should return 0 rows and it does but the problem is that the query becomes super slow and starts executing for 80+ seconds when used with Order By but when I remove order by it is fast and only when I query for the category that doesn’t exist. Planner decides to go through all the rows and it becomes super slow.

Query:

explain analyze SELECT p.* FROM products p
INNER JOIN product_categories pc ON p.id = pc.product_id
WHERE pc.category_id = 'asf'
ORDER BY p.updated_at DESC LIMIT 21;

    Limit  (cost=1001.02..26849.88 rows=21 width=240) (actual time=35099.387..35396.240 rows=0 loops=1)
    ->  Gather Merge  (cost=1001.02..6249039.48 rows=5076 width=240) (actual time=35099.385..35396.237 rows=0 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop  (cost=1.00..6247453.56 rows=2115 width=240) (actual time=35061.491..35061.492 rows=0 loops=3)
            ->  Parallel Index Scan Backward using idx_products_updated_at on products p  (cost=0.43..2596897.52 rows=5482982 width=240) (actual time=0.057..4123.748 rows=4563924 loops=3)
            ->  Index Scan using idx_product_categories_product_id on product_categories pc  (cost=0.56..0.66 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=13691771)
                Index Cond: (product_id = p.id)
                Filter: (category_id = 'asf'::text)
                Rows Removed by Filter: 1
    Planning Time: 1.157 ms
    Execution Time: 35396.346 ms

I tried to set SET enable_nestloop = off; and it started working perfectly again, but I don’t know if it’s a good thing to set this to off and if not how to avoid this problem? It only happens when I use a made up category it goes crazy on the speed, but when the category exists it is fast. The order by is the culprit as when I remove it, it starts going fast.

2

Answers


  1. Try using EXISTS subquery, if you use this query then there will be no need of direct joining of the tables or filtering by category.
    Try the following code;

    EXPLAIN ANALYZE
    SELECT p.*
    FROM products p
    WHERE EXISTS (
        SELECT 1
        FROM product_categories pc
        WHERE pc.product_id = p.id
        AND pc.category_id = 'asf'
    )
    ORDER BY p.updated_at DESC
    LIMIT 21;
    
    Login or Signup to reply.
  2. Since you didn’t share the DDL for both tables and their indexes, we can only guess that there isn’t an index on category_id. And most likely you benefit most from a combination with product_id:

    CREATE INDEX idx_product_categories_product_id_category_id
    ON product_categories (product_id, category_id);
    

    Give it a try, and please share the new query plan.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search