I have the following snippet of a table that I am trying to filter some identities for:
id fid ts val
44292 1243 2014-12-15 00:00:00.000 True
44292 1243 2017-02-02 00:00:00.000 False
44292 1250 2014-12-16 00:00:00.000 ABC
44293 1243 2014-12-15 00:00:00.000 True
44293 1243 2017-02-02 00:00:00.000 False
44293 1250 2014-12-16 00:00:00.000 KKK
44294 1243 2014-12-15 00:00:00.000 True
44294 1243 2017-02-02 00:00:00.000 False
44295 1243 2014-12-15 00:00:00.000 True
44295 1243 2017-02-02 00:00:00.000 False
What I am trying to do is to filter for id’s that do not have a row for fid = 1250
.
So in the above example, we have id’s 44292, 44293, 44294, 44295, of which only the first two id’s have a row with fid=1250
, whereas id 44294, 44295 do not have such a row.
So I would like to simply return id’s 44294, 44295; is this possible?
5
Answers
Try a negative subquery, something like this:
No need for a subquery and/or query the table twice.
You can simply
GROUP BY
the id usingFILTER
to apply the condition no row havingfid = 1250
should appear:You would use above query as subquery if you are after the entire rows with those id’s:
See this sample fiddle with your data.
A very simple set-based method is
https://www.postgresql.org/docs/current/queries-union.html
Also possible to use NOT EXISTS:
There’s another anti-join that wasn’t mentioned:
It asks each
id
to find itsfid=1250
match, then keeps only those that failed to find one.Conceptually it’s no different from Adam‘s and Stefanov.sm‘s
not exists
, Usagi Miyamoto‘snot in
and Thorsten Kettner‘sexcept
ideas shown here, the last one being the most ANSII way of doing that. In reality, PostgreSQL will optimise and execute them differently, so they won’t have the same performance.If you already have indexes set up (which you should):
Then the anti-join above win (as well as Adam’s and Stefanov’s, that result in the exact same plan), finding the answer on 1.2M sample rows in around
.4s
. Otherwise, out of those presented, the quickest method that’s not using an index is Jonas Metzler‘s aggregation – it scans the whole table, but it does that only once and in the most optimal way available, so it can fit under.6s
. However, there’s a faster way of discarding theid
‘s that have afid=1250
:This is really only asking if all
fid
‘s of a givenid
are different from1250
. Or making sure that the opposite is false, that none of them is a1250
. As soon as this gets a1250
for a givenid
, it no longer has to consider it. That seems to speed things up a bit, going below.5s
on the test set.Jonas’
count(*)
has to always count all the1250
‘s, then see if that amounts to zero after it’s done counting.You can see the 1.2M test here and tweak it to more closely resemble your real data set – the performance of each of the queries presented here as well as what index is best suited strongly depends on the characteristics of your data.