Postgresql - How can I filter a list of id's that do not have a particular row populated in a database?

user21641220
June 12, 2024
325 views
3 votes
5 Answers

I have the following snippet of a table that I am trying to filter some identities for:

 id     fid              ts               val
44292   1243    2014-12-15 00:00:00.000   True
44292   1243    2017-02-02 00:00:00.000   False
44292   1250    2014-12-16 00:00:00.000   ABC
44293   1243    2014-12-15 00:00:00.000   True
44293   1243    2017-02-02 00:00:00.000   False
44293   1250    2014-12-16 00:00:00.000   KKK
44294   1243    2014-12-15 00:00:00.000   True
44294   1243    2017-02-02 00:00:00.000   False
44295   1243    2014-12-15 00:00:00.000   True
44295   1243    2017-02-02 00:00:00.000   False

What I am trying to do is to filter for id’s that do not have a row for fid = 1250.
So in the above example, we have id’s 44292, 44293, 44294, 44295, of which only the first two id’s have a row with fid=1250, whereas id 44294, 44295 do not have such a row.

So I would like to simply return id’s 44294, 44295; is this possible?

Tags: postgresql sql

Answers

- UsagiMiyamoto
- June 12, 2024 at 12:47 pm
- 0 votes
0
Try a negative subquery, something like this:
```
SELECT DISTINCT id
FROM table
WHERE id NOT IN ( SELECT id FROM table WHERE fid = 1250 )
```
Login or Signup to reply.

- JonasMetzler
- June 12, 2024 at 1:30 pm
- 0 votes
0
No need for a subquery and/or query the table twice.

You can simply GROUP BY the id using FILTER to apply the condition no row having fid = 1250 should appear:
```
SELECT id
FROM yourtable
GROUP BY id
HAVING COUNT(*) FILTER (WHERE fid = 1250) = 0;
```
You would use above query as subquery if you are after the entire rows with those id’s:
```
SELECT *
FROM yourtable
WHERE 
  id IN
    (SELECT id
     FROM yourtable
     GROUP BY id
     HAVING COUNT(*) FILTER (WHERE fid = 1250) = 0);
```
See this sample fiddle with your data.
Login or Signup to reply.

- ThorstenKettner
- June 12, 2024 at 1:42 pm
- 0 votes
0
A very simple set-based method is
```
select id from mytable
except
select id from mytable where fid = 1250;
```
https://www.postgresql.org/docs/current/queries-union.html
Login or Signup to reply.

- p3consulting
- June 12, 2024 at 2:31 pm
- 0 votes
0
Also possible to use NOT EXISTS:
```
select distinct id from data d
where not exists( select 1 from data d1 where d1.id = d.id and d1.fid = 1250)
;
```
Login or Signup to reply.

- Zegarek
- June 12, 2024 at 2:56 pm
- 0 votes
0
There’s another anti-join that wasn’t mentioned:
```
select distinct t1.id 
from yourtable t1 
  left join yourtable t2 
  on t1.id=t2.id and t2.fid=1250
where t2.id is null;
```
It asks each id to find its fid=1250 match, then keeps only those that failed to find one.

Conceptually it’s no different from Adam‘s and Stefanov.sm‘s not exists, Usagi Miyamoto‘s not in and Thorsten Kettner‘s except ideas shown here, the last one being the most ANSII way of doing that. In reality, PostgreSQL will optimise and execute them differently, so they won’t have the same performance.

If you already have indexes set up (which you should):
```
create index on yourtable(fid,id);
create index on yourtable(id,fid);
```
Then the anti-join above win (as well as Adam’s and Stefanov’s, that result in the exact same plan), finding the answer on 1.2M sample rows in around .4s. Otherwise, out of those presented, the quickest method that’s not using an index is Jonas Metzler‘s aggregation – it scans the whole table, but it does that only once and in the most optimal way available, so it can fit under .6s. However, there’s a faster way of discarding the id‘s that have a fid=1250:
```
SELECT id
FROM yourtable
GROUP BY id
HAVING every(fid <> 1250);
--HAVING NOT bool_or(fid=1250);--a bit less readable but logically the same
```
This is really only asking if all fid‘s of a given id are different from 1250. Or making sure that the opposite is false, that none of them is a 1250. As soon as this gets a 1250 for a given id, it no longer has to consider it. That seems to speed things up a bit, going below .5s on the test set.
Jonas’ count(*) has to always count all the 1250‘s, then see if that amounts to zero after it’s done counting.

You can see the 1.2M test here and tweak it to more closely resemble your real data set – the performance of each of the queries presented here as well as what index is best suited strongly depends on the characteristics of your data.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How can I filter a list of id's that do not have a particular row populated in a database?

Answers