skip to Main Content

I have the following snippet of a table that I am trying to filter some identities for:

 id     fid              ts               val
44292   1243    2014-12-15 00:00:00.000   True
44292   1243    2017-02-02 00:00:00.000   False
44292   1250    2014-12-16 00:00:00.000   ABC
44293   1243    2014-12-15 00:00:00.000   True
44293   1243    2017-02-02 00:00:00.000   False
44293   1250    2014-12-16 00:00:00.000   KKK
44294   1243    2014-12-15 00:00:00.000   True
44294   1243    2017-02-02 00:00:00.000   False
44295   1243    2014-12-15 00:00:00.000   True
44295   1243    2017-02-02 00:00:00.000   False

What I am trying to do is to filter for id’s that do not have a row for fid = 1250.
So in the above example, we have id’s 44292, 44293, 44294, 44295, of which only the first two id’s have a row with fid=1250, whereas id 44294, 44295 do not have such a row.

So I would like to simply return id’s 44294, 44295; is this possible?

5

Answers


  1. Try a negative subquery, something like this:

    SELECT DISTINCT id
    FROM table
    WHERE id NOT IN ( SELECT id FROM table WHERE fid = 1250 )
    
    Login or Signup to reply.
  2. No need for a subquery and/or query the table twice.

    You can simply GROUP BY the id using FILTER to apply the condition no row having fid = 1250 should appear:

    SELECT id
    FROM yourtable
    GROUP BY id
    HAVING COUNT(*) FILTER (WHERE fid = 1250) = 0;
    

    You would use above query as subquery if you are after the entire rows with those id’s:

    SELECT *
    FROM yourtable
    WHERE 
      id IN
        (SELECT id
         FROM yourtable
         GROUP BY id
         HAVING COUNT(*) FILTER (WHERE fid = 1250) = 0);
    

    See this sample fiddle with your data.

    Login or Signup to reply.
  3. A very simple set-based method is

    select id from mytable
    except
    select id from mytable where fid = 1250;
    

    https://www.postgresql.org/docs/current/queries-union.html

    Login or Signup to reply.
  4. Also possible to use NOT EXISTS:

    select distinct id from data d
    where not exists( select 1 from data d1 where d1.id = d.id and d1.fid = 1250)
    ;
    
    Login or Signup to reply.
  5. There’s another anti-join that wasn’t mentioned:

    select distinct t1.id 
    from yourtable t1 
      left join yourtable t2 
      on t1.id=t2.id and t2.fid=1250
    where t2.id is null;
    

    It asks each id to find its fid=1250 match, then keeps only those that failed to find one.

    Conceptually it’s no different from Adam‘s and Stefanov.sm‘s not exists, Usagi Miyamoto‘s not in and Thorsten Kettner‘s except ideas shown here, the last one being the most ANSII way of doing that. In reality, PostgreSQL will optimise and execute them differently, so they won’t have the same performance.

    If you already have indexes set up (which you should):

    create index on yourtable(fid,id);
    create index on yourtable(id,fid);
    

    Then the anti-join above win (as well as Adam’s and Stefanov’s, that result in the exact same plan), finding the answer on 1.2M sample rows in around .4s. Otherwise, out of those presented, the quickest method that’s not using an index is Jonas Metzler‘s aggregation – it scans the whole table, but it does that only once and in the most optimal way available, so it can fit under .6s. However, there’s a faster way of discarding the id‘s that have a fid=1250:

    SELECT id
    FROM yourtable
    GROUP BY id
    HAVING every(fid <> 1250);
    --HAVING NOT bool_or(fid=1250);--a bit less readable but logically the same
    

    This is really only asking if all fid‘s of a given id are different from 1250. Or making sure that the opposite is false, that none of them is a 1250. As soon as this gets a 1250 for a given id, it no longer has to consider it. That seems to speed things up a bit, going below .5s on the test set.
    Jonas’ count(*) has to always count all the 1250‘s, then see if that amounts to zero after it’s done counting.

    You can see the 1.2M test here and tweak it to more closely resemble your real data set – the performance of each of the queries presented here as well as what index is best suited strongly depends on the characteristics of your data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search