The database used is PostgreSQL. I have a query that combines two tables and does a where search on some category and then does order by by some property. The table products has a 10+ million products. If I put a random name on category name that doesn’t exist the query should return 0 rows and it does but the problem is that the query becomes super slow and starts executing for 80+ seconds when used with Order By but when I remove order by it is fast and only when I query for the category that doesn’t exist. Planner decides to go through all the rows and it becomes super slow.
Query:
explain analyze SELECT p.* FROM products p
INNER JOIN product_categories pc ON p.id = pc.product_id
WHERE pc.category_id = 'asf'
ORDER BY p.updated_at DESC LIMIT 21;
Limit (cost=1001.02..26849.88 rows=21 width=240) (actual time=35099.387..35396.240 rows=0 loops=1)
-> Gather Merge (cost=1001.02..6249039.48 rows=5076 width=240) (actual time=35099.385..35396.237 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=1.00..6247453.56 rows=2115 width=240) (actual time=35061.491..35061.492 rows=0 loops=3)
-> Parallel Index Scan Backward using idx_products_updated_at on products p (cost=0.43..2596897.52 rows=5482982 width=240) (actual time=0.057..4123.748 rows=4563924 loops=3)
-> Index Scan using idx_product_categories_product_id on product_categories pc (cost=0.56..0.66 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=13691771)
Index Cond: (product_id = p.id)
Filter: (category_id = 'asf'::text)
Rows Removed by Filter: 1
Planning Time: 1.157 ms
Execution Time: 35396.346 ms
I tried to set SET enable_nestloop = off; and it started working perfectly again, but I don’t know if it’s a good thing to set this to off and if not how to avoid this problem? It only happens when I use a made up category it goes crazy on the speed, but when the category exists it is fast. The order by is the culprit as when I remove it, it starts going fast.
2
Answers
Try using EXISTS subquery, if you use this query then there will be no need of direct joining of the tables or filtering by category.
Try the following code;
Since you didn’t share the DDL for both tables and their indexes, we can only guess that there isn’t an index on category_id. And most likely you benefit most from a combination with product_id:
Give it a try, and please share the new query plan.