I have a large PostgreSQL DB table. From this table I need to take rows grouped by Car_id
and position
columns.
The problem is that I have a lot of duplicates and need to take one row with the best position
.
I wrote a sql example that gave me the correct results, but it needs to be modified. Or how can I do it in a cleaner way?
And I need to choose a unique car_id, with a minimum position, last by date of scrape, of all passed license plate numbers, I am not interested in what particular license plate number will be.
Example of SQL:
select
"eventDate",
"Car_id",
min("position") as "carPosition",
groupArray(concat(toString("scrapedAt"), '_', toString("position"))) as "scrapedAtByPosition",
groupArray(concat("licensePlate", '_', toString("position"))) as "licensePlateByPosition",
groupArray(concat(toString("amazonChoice"), '_', toString("position"))) as "amazonChoicesByPosition",
'organic' as "matchType"
from "Car1_ScrapeHistoryLicensePlate"
inner join (
select "Car_id", max("scrapedAt") as "scrapedAt"
from "Car1_ScrapeHistoryLicensePlate"
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "Car_id", "eventDate"
) as t1 USING ("Car_id", "scrapedAt")
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "eventDate", "Car_id"
order by "eventDate" desc;
Database records:
eventDate Car_id licensePlate position scrapedAt
---------- ------ ------------ ------- ---------
2022-09-10, 1, APRJSC512, 1, 1660000001
2022-09-10, 1, APRJSC512, 1, 1660000002
2022-09-10, 1, PLBQWN035, 1, 1660000003
2022-09-10, 1, PLBQWN035, 1, 1660000004
2022-09-10, 1, PLBQWN035, 2, 1660000002
2022-09-11, 2, APRJSC512, 1, 1660000011
2022-09-11, 2, APRJSC512, 2, 1660000022
2022-09-11, 2, PLBQWN035, 1, 1660000033
2022-09-11, 2, PLBQWN035, 2, 1660000044
2022-09-11, 2, PLBQWN035, 5, 1660000022
2022-09-12, 3, APRJSC512, 3, 1660000111
2022-09-12, 3, PLBQWN035, 3, 1660000222
2022-09-13, 4, PLBQWN035, 4, 1660001111
2022-09-14, 5, PLBQWN035, 5, 1660011111
Expected result:
eventDate Car_id licensePlate position scrapedAt
---------- ------ ------------ ------- ---------
2022-09-10, 1, PLBQWN035, 1, 1660000004
2022-09-11, 2, PLBQWN035, 1, 1660000033
2022-09-12, 3, PLBQWN035, 3, 1660000222
2
Answers
Fiddle
In PostgreSQL you can use brilliant distinct on.
The
order by
list of expressions expressions determine which record to be picked for eachcar_id
. For each group with the samecar_id
the first one is picked.DB-fiddle