SELECT
*
FROM notifications
WHERE id NOT IN (
SELECT
id
FROM notifications sn
WHERE
sn.status = 'errored'
AND sn.data->>'end_date' IS NULL
AND reference IN (
SELECT
reference
FROM notifications sn
WHERE
sn.data->>'end_date' IS NOT NULL
)
)
id
is unique – it’s a primary key.reference
is an identifier and multiple notifications can have the samereference
.- notification can be in
errored
or other state. - If notification contains
sn.data->>'end_date'
it means it’s the cancelation notification.
The goal is to filter out errored notifications IF we already received cancelation notification for given reference
.
My query works but it is very slow given that table is only ~200k rows and I think there’s a more efficient way to achieve the same result.
Current query’s EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
:
Seq Scan on public.notifications (cost=24409.49..48382.49 rows=87160 width=903) (actual time=1647.547..1755.070 rows=174189 loops=1)
Output: notifications.id, notifications.reference, notifications.created_at, notifications.updated_at, notifications.provider, notifications.status, notifi
cations.provider_customer_ref, notifications.raw_data, notifications.raw_json, notifications.product_offering, notifications.data, notifications.product_type
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 95
Buffers: shared hit=459591
SubPlan 1
-> Nested Loop Semi Join (cost=0.42..24409.49 rows=1 width=4) (actual time=5.261..1647.403 rows=95 loops=1)
Output: sn.id
Buffers: shared hit=437797
-> Seq Scan on public.notifications sn (cost=0.00..24408.80 rows=1 width=32) (actual time=5.193..1640.764 rows=204 loops=1)
Output: sn.id, sn.reference
Filter: (((sn.data ->> 'end_date'::text) IS NULL) AND (sn.status = 'errored'::editor.notification_status))
Rows Removed by Filter: 174080
Buffers: shared hit=435722
-> Index Scan using notifications_reference on public.notifications sn_1 (cost=0.42..3.65 rows=2 width=28) (actual time=0.031..0.031 rows=0 loops=204)
Output: sn_1.reference
Index Cond: ((sn_1.reference)::text = (sn.reference)::text)
Filter: ((sn_1.data ->> 'end_date'::text) IS NOT NULL)
Rows Removed by Filter: 1
Buffers: shared hit=2075
Planning:
Buffers: shared hit=16
Planning Time: 7.447 ms
Execution Time: 1763.507 ms
Thanks in advance.
2
Answers
This should be easier to read and understand. Whether it performs "better" is a different story:
The antijoin expressed as a left join can be changed to either a
not in
(usually a bad idea because of nulls) or anexists
correlated subquery.See the query plan, this is your problem:
The databases removes 174080 records, to find 204 records. Create an index on the combination of these columns and it should be able to do an index scan, within a few milliseconds.