Postgresql - Optimize the query which currently uses sub-query

AndreyDeineko
August 9, 2023
165 views
0 votes
2 Answers

SELECT
  *
FROM notifications
WHERE id NOT IN (
  SELECT
    id
  FROM notifications sn
  WHERE
    sn.status = 'errored'
    AND sn.data->>'end_date' IS NULL
    AND reference IN (
      SELECT
        reference
      FROM notifications sn
      WHERE
        sn.data->>'end_date' IS NOT NULL
    )
)

id is unique – it’s a primary key.
reference is an identifier and multiple notifications can have the same reference.
notification can be in errored or other state.
If notification contains sn.data->>'end_date' it means it’s the cancelation notification.

The goal is to filter out errored notifications IF we already received cancelation notification for given reference.

My query works but it is very slow given that table is only ~200k rows and I think there’s a more efficient way to achieve the same result.

Current query’s EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS):

Seq Scan on public.notifications  (cost=24409.49..48382.49 rows=87160 width=903) (actual time=1647.547..1755.070 rows=174189 loops=1)
   Output: notifications.id, notifications.reference, notifications.created_at, notifications.updated_at, notifications.provider, notifications.status, notifi
cations.provider_customer_ref, notifications.raw_data, notifications.raw_json, notifications.product_offering, notifications.data, notifications.product_type
   Filter: (NOT (hashed SubPlan 1))
   Rows Removed by Filter: 95
   Buffers: shared hit=459591
   SubPlan 1
     ->  Nested Loop Semi Join  (cost=0.42..24409.49 rows=1 width=4) (actual time=5.261..1647.403 rows=95 loops=1)
           Output: sn.id
           Buffers: shared hit=437797
           ->  Seq Scan on public.notifications sn  (cost=0.00..24408.80 rows=1 width=32) (actual time=5.193..1640.764 rows=204 loops=1)
                 Output: sn.id, sn.reference
                 Filter: (((sn.data ->> 'end_date'::text) IS NULL) AND (sn.status = 'errored'::editor.notification_status))
                 Rows Removed by Filter: 174080
                 Buffers: shared hit=435722
           ->  Index Scan using notifications_reference on public.notifications sn_1  (cost=0.42..3.65 rows=2 width=28) (actual time=0.031..0.031 rows=0 loops=204)
                 Output: sn_1.reference
                 Index Cond: ((sn_1.reference)::text = (sn.reference)::text)
                 Filter: ((sn_1.data ->> 'end_date'::text) IS NOT NULL)
                 Rows Removed by Filter: 1
                 Buffers: shared hit=2075
 Planning:
   Buffers: shared hit=16
 Planning Time: 7.447 ms
 Execution Time: 1763.507 ms

Thanks in advance.

Tags: postgresql sql

Answers

- MikeOrganek
- August 9, 2023 at 7:11 pm
- 0 votes
0
This should be easier to read and understand. Whether it performs "better" is a different story:
```
with cancellations as (
  select reference
    from notifications 
   where sn.data->>'end_date' is not null
)
select *
  from notifications sn
       left join cancellations c on c.reference = sn.reference
 where not (    sn.status = 'errored'
            and sn.data->>'end_date' is null
            and c.reference is not null)
```
The antijoin expressed as a left join can be changed to either a not in (usually a bad idea because of nulls) or an exists correlated subquery.
Login or Signup to reply.

- FrankHeikens
- August 9, 2023 at 11:54 pm
- 0 votes
0
See the query plan, this is your problem:
```
       ->  Seq Scan on public.notifications sn  (cost=0.00..24408.80 
             rows=1 width=32) (actual time=5.193..1640.764
                 rows=204 loops=1)
             Output: sn.id, sn.reference
             Filter: (((sn.data ->> 'end_date'::text) IS NULL) AND (sn.status = 'errored'::editor.notification_status))
             Rows Removed by Filter: 174080
```
The databases removes 174080 records, to find 204 records. Create an index on the combination of these columns and it should be able to do an index scan, within a few milliseconds.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Optimize the query which currently uses sub-query

Answers