skip to Main Content

I have the following table in PostgreSQL v14:

postgres=# d orders
                               Table "public.orders"
   Column    |  Type   | Collation | Nullable |              Default
-------------+---------+-----------+----------+------------------------------------
 id          | integer |           | not null | nextval('orders_id_seq'::regclass)
 customer_id | integer |           | not null |
 quantity    | integer |           |          |
 product_id  | integer |           |          |
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
    "orders_customer_id_idx" btree (customer_id)
    "orders_product_id_idx" btree (product_id)
    "orders_quantity_idx" btree (quantity) WHERE quantity IS NULL

Now I want a query to find all "issues" with orders, where an issue is some condition. Here is a query that would find issues of either the product or quantity being null:

select orders.id, orders.customer_id, issues.issue
from orders,
lateral (
  select 'missing_quantity' as issue where orders.quantity is null
  union all
  select 'missing_product_id' where orders.product_id is null
) issues;

This returns the correct result, however even though there should only be a handful of issues (e.g., <1% of the table), Postgres requires a sequential scan.

postgres=# set enable_seqscan = off;
SET
postgres=# explain ....;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Nested Loop  (cost=10000000000.00..10000000855.24 rows=20006 width=40)
   ->  Seq Scan on orders  (cost=10000000000.00..10000000155.03 rows=10003 width=16)
   ->  Append  (cost=0.00..0.05 rows=2 width=32)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.quantity IS NULL)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.product_id IS NULL)
 JIT:
   Functions: 6
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(10 rows)

If I manually pull out the various conditions, then PostgreSQL can perform the more optimal bitmap heap:

postgres=# set enable_seqscan = on;
SET
postgres=# explain select orders.id, orders.customer_id, issues.issue
from orders,
lateral (
  select 'missing_quantity' as issue where orders.quantity is null
  union all
  select 'missing_product_id' where orders.product_id is null
) issues
where orders.quantity is null or orders.product_id is null;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.43..15.44 rows=4 width=40)
   ->  Bitmap Heap Scan on orders  (cost=8.43..15.30 rows=2 width=16)
         Recheck Cond: ((quantity IS NULL) OR (product_id IS NULL))
         ->  BitmapOr  (cost=8.43..8.43 rows=2 width=0)
               ->  Bitmap Index Scan on orders_quantity_idx  (cost=0.00..4.13 rows=1 width=0)
                     Index Cond: (quantity IS NULL)
               ->  Bitmap Index Scan on orders_product_id_idx  (cost=0.00..4.29 rows=1 width=0)
                     Index Cond: (product_id IS NULL)
   ->  Append  (cost=0.00..0.05 rows=2 width=32)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.quantity IS NULL)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
               One-Time Filter: (orders.product_id IS NULL)
(13 rows)

Is there a way I can rewrite this query such that:

1.) I don’t need to "duplicate" the conditions inside of my lateral subquery, and…

2.) It won’t require the slower sequential scan?

2

Answers


  1. How about this:

    select orders.id, orders.customer_id, issues.issue
    from orders
    join (
      select 'missing_quantity' as issue, id
        from orders
        where orders.quantity is null
      union all
      select 'missing_product_id' , id
       from orders 
      where orders.product_id is null
    ) issues ON orders.id = issues.id
    

    The subquery with the union in it retrieves the id values of orders with issues. The inner join operation fetches the order details from orders while only including the orders with issues. (If an order has more than one issue it will appear duplicated in the result set.)

    This will, probably (I don’t have your data), yield two index scans and a nested loop for the outer query.

    Login or Signup to reply.
  2. Wouldn’t this query give identical results?

    select orders.id, orders.customer_id, 'missing_quantity' as issue from orders where quantity is null
    union all
    select orders.id, orders.customer_id, 'missing_product_id' from orders where product_id is null
    

    Or if not, please explain what the difference would be.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search