skip to Main Content

I have a table items and a table batches. A batch can have n items associated by items.batch_id.

I’d like to write a query item counts in two groups batched and unbatched:

  1. items WHERE batch_id IS NOT NULL (batched)
  2. items WHERE batch_id IS NULL (unbatched)

The result should look like this

batched unbatched
1200000 100

Any help appreciated, thank you!

EDIT:
I got stuck with using GROUP BY which turned out to be the wrong tool for the job.

2

Answers


  1. You can use COUNT with `FILTER( WHERE)

    it is called conditional count

    CREATE TABLE items(item_id int, batch_id int)
    
    CREATE TABLE
    
    INSERT INTO items VALUEs(1,NULL),(2,NULL),(3,1)
    
    INSERT 0 3
    
    CREATE tABLe batch (batch_id int)
    
    CREATE TABLE
    
    select 
           count(*) filter (WHERE batch_id IS NOT NULL ) as "matched"
      ,
             count(*) filter (WHERE batch_id IS NULL ) as "unmatched"
    from items 
    
    matched unmatched
    1 2
    SELECT 1
    

    fiddle

    Login or Signup to reply.
  2. The count() function seems to be the most likely basic tool here. Given an expression, it returns a count of the number of rows where that expression evaluates to non-null. Given the argument *, it counts all rows in the group.

    To the extent that there is a trick, it is getting the batched an unbatched counts in the same result row. There are at least three ways to do that:

    • Using subqueries:
    select
      (select count(batch_id) from items) as batched,
      (select count(*) from items where batch_id is null) as unbatched
    -- no FROM
    

    That’s pretty straightforward. Each subquery is executed and produces one column of the result. Because no FROM clause is given in the outer query, there will be exactly one result row.

    • Using window functions:
    select
      count(batch_id) over () as batched,
      (count(*) over () - count(batch_id) over ()) as unbatched
    from items
    limit 1
    

    That will compute the batched and unbatched results for the whole table on every result row, one per row of the items table, but then only one result row is actually returned. It is reasonable to hope (though you would definitely want to test) that postgres doesn’t actually compute those counts for all the rows that are culled by the limit clause. You might, for example, compare the performance of this option with that of the previous option.

    • Using count() with a filter clause, as described in detail in another answer.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search