skip to Main Content

I’ve tried all afternoon to dedup a table that looks like this:

ID1 | ID2 | Date       | Time            |Status  | Price 
----+-----+------------+-----------------+--------+-------
01  | A   | 01/01/2022 | 10:41:47.000000 | DDD    | 55
01  | B   | 02/01/2022 | 16:22:31.000000 | DDD    | 53
02  | C   | 01/01/2022 | 08:54:03.000000 | AAA    | 72
02  | D   | 03/01/2022 | 11:12:35.000000 | DDD    | 
03  | E   | 01/01/2022 | 17:15:41.000000 | DDD    | 67
03  | F   | 01/01/2022 | 19:27:22.000000 | DDD    | 69
03  | G   | 02/01/2022 | 06:45:52.000000 | DDD    | 78

Basically, I need to dedup based on two conditions:

  1. Status: where AAA > BBB > CCC > DDD. So, pick the highest one.
  2. When the Status is the same given the same ID1, pick the latest one based on Date and Time.

The final table should look like:

ID1 | ID2 | Date       | Time            |Status  | Price 
----+-----+------------+-----------------+--------+-------
01  | B   | 02/01/2022 | 16:22:31.000000 | DDD    | 53
02  | C   | 01/01/2022 | 08:54:03.000000 | AAA    | 72
03  | G   | 02/01/2022 | 06:45:52.000000 | DDD    | 78

Is there a way to do this in Redshift SQL / PostgreSQL?

I tried variations of this, but everytime it doesn’t work because it demands that I add all columns to the group by, so then it defeats the purpose

select a.id1,
        b.id2,
        b.date,
       b.time,
       b.status,
       b.price,
       case when (status = 'AAA') then 4
            when (status = 'BBB') then 3
            when (status= 'CCC') then 2
            when (status = 'DDD') then 1
            when (status = 'EEE') then 0
                else null end as row_order
from table1 a
left join table2 b
on a.id1=b.id1
group by id1
having row_order = max(row_order)
   and date=max(date)
   and time=max(time)

Any help at all is appreciated!

3

Answers


  1. Windowing functions are good at this:

    SELECT ID1, ID2, Date, Time, Status, Price
    FROM (
        SELECT *,
            row_number() OVER (PARTITION BY ID1 ORDER BY Status, Date DESC, Time DESC) rn
        FROM MyTable
    ) t
    WHERE rn = 1
    

    See it work here:

    https://dbfiddle.uk/uAvDz1Qn

    Login or Signup to reply.
  2. You can use ROW_NUMBER() like so:

    with cte as (
    select a.id1,
           b.id2,
           b.date,
           b.time,
           b.status,
           b.price,
           ROW_NUMBER() OVER (PARTITION BY a.id1 ORDER BY b.status ASC, b.date DESC, b.time DESC) RN
    from table1 a
    left join table2 b on a.id1=b.id1
    )
    
    select * from cte where rn = 1
    
    Login or Signup to reply.
  3. This is a typical top-1-per-group problem. The canonical solution indeed involves window functions, as demonstrated by Joel Coehoorn and Aaron Dietz.

    But Postgres has a specific extension, called distinct on, which is built exactly for the purpose of solving top-1-per-group problems. The syntax is neater, and you benefit built-in optimizations:

    select distinct on (id1) t.*
    from mytable t
    order by id1, status, "Date" desc, "Time" desc
    

    Here is a demo on DB Fiddle based on that of Joel Coehoorn.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search