Suppose I have a table (DeliveryEvent
) like this grouped by delivery_id
, where event_type
2 is an event of ‘scheduled’, 3 is an event of ‘unscheduled’, and 4 is an event of ‘completed’:
id | created | event_type | delivery_id | extra |
---|---|---|---|---|
1 | 2022-10-27 18:04 | 2 | 10005 | |
2 | 2022-10-27 19:00 | 3 | 10005 | {"couldn’t deliver"} |
3 | 2022-10-27 19:20 | 2 | 10005 | |
4 | 2022-10-27 20:30 | 3 | 10005 | {"timeout"} |
5 | 2022-10-27 21:15 | 2 | 10005 | |
6 | 2022-10-27 22:40 | 3 | 10005 | {"timeout"} |
7 | 2022-10-27 22:55 | 2 | 10005 | |
8 | 2022-10-27 23:00 | 4 | 10005 |
…
I need a resulting table that for each unscheduled event due to ‘timeout’ I have information on the scheduled event that occurred just before this timeout, to get the duration between scheduled and unscheduled.
From the below SELECT
I obtain different combinations of scheduled with the other unscheduled events by a timeout:
SELECT
scheduled.id as scheduled_id,
scheduled.created as scheduled_time,
scheduled.event as scheduled_event,
scheduled.delivery_id as delivery_id,
unscheduled.id as unscheduled_id,
unscheduled.created as unscheduled_time,
unscheduled.event as unscheduled_event,
unscheduled.extra as extra
FROM
delivery_event scheduled_event
JOIN
delivery_event unscheduled_event ON scheduled.delivery_id = 10005
AND unscheduled.delivery_id = 10005
AND unscheduled.event = 3
AND scheduled.event = 2
AND scheduled.created < unscheduled.created
AND unscheduled.extra->>'timeout'
scheduled_id | scheduled_time | scheduled_event | delivery_id | unscheduled_id | unscheduled_time | scheduled_event | extra |
---|---|---|---|---|---|---|---|
5 | 2022-10-27 21:15 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
3 | 2022-10-27 19:20 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
1 | 2022-10-27 18:04 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
3 | 2022-10-27 19:20 | 2 | 10005 | 4 | 2022-10-27 20:30 | 3 | {"timeout"} |
1 | 2022-10-27 18:04 | 2 | 10005 | 4 | 2022-10-27 20:30 | 3 | {"timeout"} |
but instead I’d like to only have the following result:
scheduled_id | scheduled_time | scheduled_event | delivery_id | unscheduled_id | unscheduled_time | scheduled_event | extra |
---|---|---|---|---|---|---|---|
5 | 2022-10-27 21:15 | 2 | 10005 | 6 | 2022-10-27 22:40 | 3 | {"timeout"} |
3 | 2022-10-27 19:20 | 2 | 10005 | 4 | 2022-10-27 20:30 | 3 | {"timeout"} |
only the scheduled events that occurred right before the unscheduled ones, i.e., the last scheduled event before each unscheduled event due to timeout.
2
Answers
Try this :
Two solutions I can think of for this. The easiest would probably be to use a distinct on clause in order to drop all rows with duplicate unscheduled_id with an order by clause on scheduled.created in order to ensure you get back the correct row.
An alternative solution would be to use a left lateral join such that only the latest scheduled event will be joined.
One note of warning on joining tables like this. Although joining with only one relevant row is very convenient it tends to come with significant performances drawbacks, so make sure to check if the performance of either of these methods are satisfactory for your particular circumstances.