In this dbfiddle demo I have a DELETE FROM... WHERE
at the end like so:
......
DELETE FROM data_table
WHERE
(location, param_id, ref_time, fcst_time) NOT IN (SELECT location, param_id, ref_time, fcst_time FROM sel1)
AND
(location, param_id, ref_time, fcst_time) NOT IN (SELECT location, param_id, ref_time, fcst_time FROM sel2);
Although it works, it seems unnecessarily wordy, and possibly also not optimal in terms of performance?
Is there any way in which this can be simplified, e.g. by using a single NOT IN
statement?
Because of the way that they are defined, there is no overlap/intersection between sel1
and sel2
.
status_table
has 8033 rows
data_table
has 116432724 rows
sel1
has 61860084 rows
sel2
has 53706188 rows
Numbers don’t necessarily add up because it’s a live database and data is going in all the time. And yes, at the moment there are few if any rows to delete because most/all of the data is current.
2
Answers
How about another WITH around everything?
Answering the original question in this thread:
Changing
WHERE (a) NOT IN (b) AND NOT IN (c)
toWHERE (a) NOT IN (b UNION c)
not only shortens the expression, but also provides some performance improvements. Depending on how big an overlap there is betweenb
andc
, you can pick eitherUNION
orUNION ALL
:WHERE (a) NOT IN (b UNION c)
removes duplicates betweenb
andc
which adds a small cost initially but can speed things up later whena
is compared against the effectively smaller union.WHERE (a) NOT IN (b UNION ALL c)
skips deduplication, so if by design there can’t be any, it’s the better choice.Here’s a demo showing how plans change. Locally, I also tested a delete from 120m row table against 2x 10m tables with 99% overlap and got 118s for initial version, 105s for
union
, 98s forunion all
. With all the same counts and no overlap, it was 118s, 103s, 95s. With more columns and more complex types, I expect the difference to be more pronounced.I figured I’d make my comment into an answer, adding the tests that followed, leaving the broader optimisation to be discussed in the new thread.