I am following along the example of this excellent question and first answer:
PostgreSQL – fetch the rows which have the Max value for a column in each GROUP BY group
But I need to do something slightly different. I want to select distinctly based on events.uuid, because events.start might not be unique. But I want to order by events.start because uuid’s are generated in no particular order. Database doesn’t like it. What is a good way of doing what I really want to do?
SELECT DISTINCT ON (events.uuid) events.nickname, plays.id
FROM events
JOIN plays on plays.eventuuid = events.uuid
ORDER BY events.start;
In English I would state it like this: "For each event, give me exactly one play, and order the results by event start time."
2
Answers
You just need two ORDER BYs, which means you need two levels of select:
You can actually omit the inner ORDER BY altogether, as it will be implied. But the implied ORDER BY still needs to be on a different query level than the explicit one.
Starting in v15, the dummy alias ("foo" here) can also be omitted.
If the uuid is a primary key for the events table, then you’re in luck because the set:
Is the same as that returned by:
But just in a different order.
Having both events.start and events.uuid in the distinct clause will mean the set of the two is unique, but if uuid is already unique per event row, then adding other columns from the events table to the distinct on clause should not change the result set.