Postgresql - How to optimize lateral query without duplicating conditions

MarkHildreth
September 19, 2023
253 views
0 votes
2 Answers

I have the following table in PostgreSQL v14:

postgres=# d orders
                               Table "public.orders"
   Column    |  Type   | Collation | Nullable |              Default
-------------+---------+-----------+----------+------------------------------------
 id          | integer |           | not null | nextval('orders_id_seq'::regclass)
 customer_id | integer |           | not null |
 quantity    | integer |           |          |
 product_id  | integer |           |          |
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
    "orders_customer_id_idx" btree (customer_id)
    "orders_product_id_idx" btree (product_id)
    "orders_quantity_idx" btree (quantity) WHERE quantity IS NULL

Now I want a query to find all "issues" with orders, where an issue is some condition. Here is a query that would find issues of either the product or quantity being null:

select orders.id, orders.customer_id, issues.issue
from orders,
lateral (
  select 'missing_quantity' as issue where orders.quantity is null
  union all
  select 'missing_product_id' where orders.product_id is null
) issues;

This returns the correct result, however even though there should only be a handful of issues (e.g., <1% of the table), Postgres requires a sequential scan.

postgres=# set enable_seqscan = off;
SET
postgres=# explain ....;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Nested Loop  (cost=10000000000.00..10000000855.24 rows=20006 width=40)
   ->  Seq Scan on orders  (cost=10000000000.00..10000000155.03 rows=10003 width=16)
   ->  Append  (cost=0.00..0.05 rows=2 width=32)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.quantity IS NULL)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.product_id IS NULL)
 JIT:
   Functions: 6
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(10 rows)

If I manually pull out the various conditions, then PostgreSQL can perform the more optimal bitmap heap:

postgres=# set enable_seqscan = on;
SET
postgres=# explain select orders.id, orders.customer_id, issues.issue
from orders,
lateral (
  select 'missing_quantity' as issue where orders.quantity is null
  union all
  select 'missing_product_id' where orders.product_id is null
) issues
where orders.quantity is null or orders.product_id is null;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.43..15.44 rows=4 width=40)
   ->  Bitmap Heap Scan on orders  (cost=8.43..15.30 rows=2 width=16)
         Recheck Cond: ((quantity IS NULL) OR (product_id IS NULL))
         ->  BitmapOr  (cost=8.43..8.43 rows=2 width=0)
               ->  Bitmap Index Scan on orders_quantity_idx  (cost=0.00..4.13 rows=1 width=0)
                     Index Cond: (quantity IS NULL)
               ->  Bitmap Index Scan on orders_product_id_idx  (cost=0.00..4.29 rows=1 width=0)
                     Index Cond: (product_id IS NULL)
   ->  Append  (cost=0.00..0.05 rows=2 width=32)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.quantity IS NULL)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.product_id IS NULL)
(13 rows)

Is there a way I can rewrite this query such that:

1.) I don’t need to "duplicate" the conditions inside of my lateral subquery, and…

2.) It won’t require the slower sequential scan?

Tags: postgresql query-optimization

Answers

- OJones
- September 6, 2023 at 6:36 pm
- 0 votes
0
How about this:
```
select orders.id, orders.customer_id, issues.issue
from orders
join (
  select 'missing_quantity' as issue, id
    from orders
    where orders.quantity is null
  union all
  select 'missing_product_id' , id
   from orders 
  where orders.product_id is null
) issues ON orders.id = issues.id
```
The subquery with the union in it retrieves the id values of orders with issues. The inner join operation fetches the order details from orders while only including the orders with issues. (If an order has more than one issue it will appear duplicated in the result set.)

This will, probably (I don’t have your data), yield two index scans and a nested loop for the outer query.
Login or Signup to reply.

- jjanes
- September 6, 2023 at 10:07 pm
- 0 votes
0
Wouldn’t this query give identical results?
```
select orders.id, orders.customer_id, 'missing_quantity' as issue from orders where quantity is null
union all
select orders.id, orders.customer_id, 'missing_product_id' from orders where product_id is null
```
Or if not, please explain what the difference would be.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to optimize lateral query without duplicating conditions

Answers