skip to Main Content

I have a large PostgreSQL DB table. From this table I need to take rows grouped by Car_id and position columns.
The problem is that I have a lot of duplicates and need to take one row with the best position.
I wrote a sql example that gave me the correct results, but it needs to be modified. Or how can I do it in a cleaner way?

And I need to choose a unique car_id, with a minimum position, last by date of scrape, of all passed license plate numbers, I am not interested in what particular license plate number will be.

Example of SQL:

select
    "eventDate",
    "Car_id",
    min("position") as "carPosition",
    groupArray(concat(toString("scrapedAt"), '_', toString("position"))) as "scrapedAtByPosition",
    groupArray(concat("licensePlate", '_', toString("position"))) as "licensePlateByPosition",
    groupArray(concat(toString("amazonChoice"), '_', toString("position"))) as "amazonChoicesByPosition",
    'organic' as "matchType"
from "Car1_ScrapeHistoryLicensePlate"
         inner join (
             select "Car_id", max("scrapedAt") as "scrapedAt"
             from "Car1_ScrapeHistoryLicensePlate"
             where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
             group by "Car_id", "eventDate"
         ) as t1 USING ("Car_id", "scrapedAt")
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "eventDate", "Car_id"
order by "eventDate" desc;

Database records:

eventDate  Car_id  licensePlate position scrapedAt
---------- ------  ------------ -------  --------- 
2022-09-10,   1,   APRJSC512,    1,     1660000001
2022-09-10,   1,   APRJSC512,    1,     1660000002
2022-09-10,   1,   PLBQWN035,    1,     1660000003
2022-09-10,   1,   PLBQWN035,    1,     1660000004
2022-09-10,   1,   PLBQWN035,    2,     1660000002
2022-09-11,   2,   APRJSC512,    1,     1660000011
2022-09-11,   2,   APRJSC512,    2,     1660000022
2022-09-11,   2,   PLBQWN035,    1,     1660000033
2022-09-11,   2,   PLBQWN035,    2,     1660000044
2022-09-11,   2,   PLBQWN035,    5,     1660000022
2022-09-12,   3,   APRJSC512,    3,     1660000111
2022-09-12,   3,   PLBQWN035,    3,     1660000222
2022-09-13,   4,   PLBQWN035,    4,     1660001111
2022-09-14,   5,   PLBQWN035,    5,     1660011111

Expected result:

eventDate  Car_id  licensePlate position scrapedAt
---------- ------  ------------ -------  ---------
2022-09-10,   1,   PLBQWN035,    1,     1660000004
2022-09-11,   2,   PLBQWN035,    1,     1660000033
2022-09-12,   3,   PLBQWN035,    3,     1660000222

2

Answers


  1. select  eventDate
           ,Car_id 
           ,licensePlate 
           ,position 
           ,scrapedAt
    from
    (
    select  *
           ,row_number() over(partition by car_id order by position, scrapedat desc) as rn
    from    t
    ) t
    where   rn = 1
    
    eventdate car_id licenseplate position scrapedat
    2022-09-10 1 PLBQWN035 1 1660000004
    2022-09-11 2 PLBQWN035 1 1660000033
    2022-09-12 3 PLBQWN035 3 1660000222
    2022-09-13 4 PLBQWN035 4 1660001111
    2022-09-14 5 PLBQWN035 5 1660011111

    Fiddle

    Login or Signup to reply.
  2. In PostgreSQL you can use brilliant distinct on.
    The order by list of expressions expressions determine which record to be picked for each car_id. For each group with the same car_id the first one is picked.

    select distinct on (car_id) * -- or the relevant expression list here
    from the_table
    order by car_id, position, scrapedat desc;
    

    DB-fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search