skip to Main Content

Table tmch contains thousands of rows here it is:

CREATE TABLE IF NOT EXISTS public.tmch (
    id bigserial NOT NULL,
    year integer,
    week integer,
    my_number integer,
    device_id bigint,
    CONSTRAINT tmch_pkey PRIMARY KEY (id)
)

Sample data:

1716446 2024    37  13  2
1716447 2024    37  13  2
1716448 2024    37  0   3
1716449 2024    37  11  4
1716450 2024    37  12  4
1716451 2024    37  0   6
1716452 2024    37  0   6
1716453 2024    37  0   6
1716454 2024    37  1   6
1716455 2024    37  1   6
1716456 2024    37  9   7

Here is a query to count how many times my_number occurs for each group of (week, year, device_id):

select count(my_number) c, my_number, device_id, year, week from tmch
group by my_number, device_id, year, week
order by device_id asc, c desc

Result of this query – based on a bigger, different sample:

6   16  2   2024    37
4   17  2   2024    37
4   15  2   2024    37
2   13  2   2024    37
1   2   2   2024    37
8   15  3   2024    37
6   16  3   2024    37
5   14  3   2024    37
3   20  3   2024    37  
3   18  3   2024    37
3   19  3   2024    37
1   8   3   2024    37

How to get only rows with the highest count of my_number per
group of (week, year, device_id)?

Result for the sample above:

6   16  2   2024    37  -- because my_number=16 occurs 6 times for device_id=2 y=2024 w=37
8   15  3   2024    37  -- because my_number=15 occurs 8 times for device_id=3 y=2024 w=37

I tried row_number() over (partition by ...) but without success.

2

Answers


  1. DISTINCT ON does what you want, if you use your query as a subquery:

    SELECT DISTINCT ON (device_id, week, year)
           c, my_number, device_id, year, week
    FROM (SELECT count(my_number) AS c,
                 my_number,
                 device_id,
                 year,
                 week
          FROM tmch
          GROUP BY my_number, device_id, year, week) AS sub
    ORDER BY device_id, week, year, c DESC;
    
    Login or Signup to reply.
  2. In a SELECT query, DISTINCT ON is applied after aggregation with GROUP BY and aggregate function. (Even after window functions.) So you can do it all in a single query level without subquery.

    And since you list my_number in the GROUP BY clause, it makes more sense to use count(*) instead of count(my_number). A bit faster, too.
    The only logical difference: if my_number can be null, you get an actual count for that group, too. Else you get a count of 0 for the null group.

    SELECT DISTINCT ON (device_id, year, week)
           count(*) AS c, my_number, device_id, year, week
    FROM   tmch
    GROUP  BY device_id, year, week, my_number
    ORDER  BY device_id, year, week, c DESC;
    

    The same query with a more intuitive order of columns:

    SELECT DISTINCT ON (device_id, year, week)
           device_id, year, week, my_number, count(*) AS c
    FROM   tmch
    GROUP  BY device_id, year, week, my_number
    ORDER  BY device_id, year, week, c DESC;
    

    Or with minimal syntax:

    SELECT DISTINCT ON (1,2,3)
           device_id, year, week, my_number, count(*) AS c
    FROM   tmch
    GROUP  BY 1, 2, 3, 4
    ORDER  BY 1, 2, 3, c DESC;
    

    fiddle

    Just make sure that ORDER BY and DISTINCT ON do not disagree. See:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search