skip to Main Content

Say I have a table with kids and their toys.

CREATE TABLE kids_toys (
  kid_name character varying,
  toy_type character varying,
  toy_name character varying
);
kid_name toy_type toy_name
Edward bear Pooh
Edward bear Pooh2
Edward bear Simba
Edward car Vroom
Lydia doll Sally
Lydia car Beeps
Lydia car Speedy
Edward car Red

I want to get a list of the the most popular toy type for each kid, grouped by kid. So the result would be

kid_name toy_type count
Edward bear 3
Lydia car 2

Assuming Postgres 15 as the engine, how would I query to do this? I keep getting stuck on how to generate the count but then only take the max result from each per-kid count.

2

Answers


  1. First, group by kid_name and toy_type to find how many toys the kid has from each type.

    Then, add a row_number window function partitioned only by the kid_name and order by the count descending to find the position of each toy_type from highest count to lowest for each individual kid

    And lastly, filter only the records with row_num = 1

    Also, if you would like the top 3 toys per kid for example, you can use row_num <= 3 instead

    select kid_name, toy_type, cnt
    from
    (select kid_name, toy_type, cnt, row_number() over(partition by kid_name order by cnt desc) as row_num
      from (
        select kid_name, toy_type, count(*) as cnt
        from kids_toys
        group by kid_name, toy_type
      ) as grouped
    ) as with_row_num
    where row_num = 1
    
    Login or Signup to reply.
  2. In Postgres, I would recommend distinct on, which can get the job done in a single pass:

    select distinct on (kid_name) kid_name, toy_type, count(*) cnt
    from kids_toys
    group by kid_name, toy_type
    order by kid_name, count(*) desc, toy_type
    

    The query groups the dataset by kid and toy. Then distinct on ensures that only one record is returned for each kid; the order by clause puts the most popular toy of each kid first. If there are ties, the first toy is picked (alphabetically).


    If you wanted to retain ties (which Postgres’ distinct on cannot do), we could use rank() and fetch with ties instead:

    select kid_name, toy_type, count(*) cnt
    from kids_toys
    group by kid_name, toy_type
    order by rank() over(partition by kid_name order by count(*) desc)
    fetch first row with ties
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search