Say I have a table with kids and their toys.
CREATE TABLE kids_toys (
kid_name character varying,
toy_type character varying,
toy_name character varying
);
kid_name | toy_type | toy_name |
---|---|---|
Edward | bear | Pooh |
Edward | bear | Pooh2 |
Edward | bear | Simba |
Edward | car | Vroom |
Lydia | doll | Sally |
Lydia | car | Beeps |
Lydia | car | Speedy |
Edward | car | Red |
I want to get a list of the the most popular toy type for each kid, grouped by kid. So the result would be
kid_name | toy_type | count |
---|---|---|
Edward | bear | 3 |
Lydia | car | 2 |
Assuming Postgres 15 as the engine, how would I query to do this? I keep getting stuck on how to generate the count but then only take the max result from each per-kid count.
2
Answers
First, group by
kid_name
andtoy_type
to find how many toys the kid has from each type.Then, add a
row_number
window function partitioned only by thekid_name
and order by thecount
descending to find the position of eachtoy_type
from highest count to lowest for each individual kidAnd lastly, filter only the records with
row_num = 1
Also, if you would like the top 3 toys per kid for example, you can use
row_num <= 3
insteadIn Postgres, I would recommend
distinct on
, which can get the job done in a single pass:The query groups the dataset by kid and toy. Then
distinct on
ensures that only one record is returned for each kid; theorder by
clause puts the most popular toy of each kid first. If there are ties, the first toy is picked (alphabetically).If you wanted to retain ties (which Postgres’
distinct on
cannot do), we could userank()
andfetch with ties
instead: