Postgresql - How to select only rows with the max count per column value in a subgroup?

tigu
September 18, 2024
84 views
0 votes
2 Answers

Table tmch contains thousands of rows here it is:

CREATE TABLE IF NOT EXISTS public.tmch (
    id bigserial NOT NULL,
    year integer,
    week integer,
    my_number integer,
    device_id bigint,
    CONSTRAINT tmch_pkey PRIMARY KEY (id)
)

Sample data:

1716446 2024    37  13  2
1716447 2024    37  13  2
1716448 2024    37  0   3
1716449 2024    37  11  4
1716450 2024    37  12  4
1716451 2024    37  0   6
1716452 2024    37  0   6
1716453 2024    37  0   6
1716454 2024    37  1   6
1716455 2024    37  1   6
1716456 2024    37  9   7

Here is a query to count how many times my_number occurs for each group of (week, year, device_id):

select count(my_number) c, my_number, device_id, year, week from tmch
group by my_number, device_id, year, week
order by device_id asc, c desc

Result of this query – based on a bigger, different sample:

6   16  2   2024    37
4   17  2   2024    37
4   15  2   2024    37
2   13  2   2024    37
1   2   2   2024    37
8   15  3   2024    37
6   16  3   2024    37
5   14  3   2024    37
3   20  3   2024    37  
3   18  3   2024    37
3   19  3   2024    37
1   8   3   2024    37

How to get only rows with the highest count of my_number per
group of (week, year, device_id)?

Result for the sample above:

6   16  2   2024    37  -- because my_number=16 occurs 6 times for device_id=2 y=2024 w=37
8   15  3   2024    37  -- because my_number=15 occurs 8 times for device_id=3 y=2024 w=37

I tried row_number() over (partition by ...) but without success.

Answers

- LaurenzAlbe
- September 17, 2024 at 3:27 pm
- 0 votes
0
DISTINCT ON does what you want, if you use your query as a subquery:
```
SELECT DISTINCT ON (device_id, week, year)
       c, my_number, device_id, year, week
FROM (SELECT count(my_number) AS c,
             my_number,
             device_id,
             year,
             week
      FROM tmch
      GROUP BY my_number, device_id, year, week) AS sub
ORDER BY device_id, week, year, c DESC;
```
Login or Signup to reply.

- ErwinBrandstetter
- September 18, 2024 at 12:21 pm
- 0 votes
0
In a SELECT query, DISTINCT ON is applied after aggregation with GROUP BY and aggregate function. (Even after window functions.) So you can do it all in a single query level without subquery.

And since you list my_number in the GROUP BY clause, it makes more sense to use count(*) instead of count(my_number). A bit faster, too.
The only logical difference: if my_number can be null, you get an actual count for that group, too. Else you get a count of 0 for the null group.
```
SELECT DISTINCT ON (device_id, year, week)
       count(*) AS c, my_number, device_id, year, week
FROM   tmch
GROUP  BY device_id, year, week, my_number
ORDER  BY device_id, year, week, c DESC;
```
The same query with a more intuitive order of columns:
```
SELECT DISTINCT ON (device_id, year, week)
       device_id, year, week, my_number, count(*) AS c
FROM   tmch
GROUP  BY device_id, year, week, my_number
ORDER  BY device_id, year, week, c DESC;
```
Or with minimal syntax:
```
SELECT DISTINCT ON (1,2,3)
       device_id, year, week, my_number, count(*) AS c
FROM   tmch
GROUP  BY 1, 2, 3, 4
ORDER  BY 1, 2, 3, c DESC;
```
fiddle

Just make sure that ORDER BY and DISTINCT ON do not disagree. See:
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to select only rows with the max count per column value in a subgroup?

Answers