I have the following table in PostgreSQL v14:
postgres=# d orders
Table "public.orders"
Column | Type | Collation | Nullable | Default
-------------+---------+-----------+----------+------------------------------------
id | integer | | not null | nextval('orders_id_seq'::regclass)
customer_id | integer | | not null |
quantity | integer | | |
product_id | integer | | |
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)
"orders_customer_id_idx" btree (customer_id)
"orders_product_id_idx" btree (product_id)
"orders_quantity_idx" btree (quantity) WHERE quantity IS NULL
Now I want a query to find all "issues" with orders, where an issue is some condition. Here is a query that would find issues of either the product or quantity being null:
select orders.id, orders.customer_id, issues.issue
from orders,
lateral (
select 'missing_quantity' as issue where orders.quantity is null
union all
select 'missing_product_id' where orders.product_id is null
) issues;
This returns the correct result, however even though there should only be a handful of issues (e.g., <1% of the table), Postgres requires a sequential scan.
postgres=# set enable_seqscan = off;
SET
postgres=# explain ....;
QUERY PLAN
-------------------------------------------------------------------------------------
Nested Loop (cost=10000000000.00..10000000855.24 rows=20006 width=40)
-> Seq Scan on orders (cost=10000000000.00..10000000155.03 rows=10003 width=16)
-> Append (cost=0.00..0.05 rows=2 width=32)
-> Result (cost=0.00..0.01 rows=1 width=32)
One-Time Filter: (orders.quantity IS NULL)
-> Result (cost=0.00..0.01 rows=1 width=32)
One-Time Filter: (orders.product_id IS NULL)
JIT:
Functions: 6
Options: Inlining true, Optimization true, Expressions true, Deforming true
(10 rows)
If I manually pull out the various conditions, then PostgreSQL can perform the more optimal bitmap heap:
postgres=# set enable_seqscan = on;
SET
postgres=# explain select orders.id, orders.customer_id, issues.issue
from orders,
lateral (
select 'missing_quantity' as issue where orders.quantity is null
union all
select 'missing_product_id' where orders.product_id is null
) issues
where orders.quantity is null or orders.product_id is null;
QUERY PLAN
------------------------------------------------------------------------------------------------
Nested Loop (cost=8.43..15.44 rows=4 width=40)
-> Bitmap Heap Scan on orders (cost=8.43..15.30 rows=2 width=16)
Recheck Cond: ((quantity IS NULL) OR (product_id IS NULL))
-> BitmapOr (cost=8.43..8.43 rows=2 width=0)
-> Bitmap Index Scan on orders_quantity_idx (cost=0.00..4.13 rows=1 width=0)
Index Cond: (quantity IS NULL)
-> Bitmap Index Scan on orders_product_id_idx (cost=0.00..4.29 rows=1 width=0)
Index Cond: (product_id IS NULL)
-> Append (cost=0.00..0.05 rows=2 width=32)
-> Result (cost=0.00..0.01 rows=1 width=32)
One-Time Filter: (orders.quantity IS NULL)
-> Result (cost=0.00..0.01 rows=1 width=32)
One-Time Filter: (orders.product_id IS NULL)
(13 rows)
Is there a way I can rewrite this query such that:
1.) I don’t need to "duplicate" the conditions inside of my lateral subquery, and…
2.) It won’t require the slower sequential scan?
2
Answers
How about this:
The subquery with the
union
in it retrieves theid
values of orders with issues. The innerjoin
operation fetches the order details fromorders
while only including the orders with issues. (If an order has more than one issue it will appear duplicated in the result set.)This will, probably (I don’t have your data), yield two index scans and a nested loop for the outer query.
Wouldn’t this query give identical results?
Or if not, please explain what the difference would be.