skip to Main Content

Suppose I have a table (DeliveryEvent) like this grouped by delivery_id, where event_type 2 is an event of ‘scheduled’, 3 is an event of ‘unscheduled’, and 4 is an event of ‘completed’:

id created event_type delivery_id extra
1 2022-10-27 18:04 2 10005
2 2022-10-27 19:00 3 10005 {"couldn’t deliver"}
3 2022-10-27 19:20 2 10005
4 2022-10-27 20:30 3 10005 {"timeout"}
5 2022-10-27 21:15 2 10005
6 2022-10-27 22:40 3 10005 {"timeout"}
7 2022-10-27 22:55 2 10005
8 2022-10-27 23:00 4 10005

I need a resulting table that for each unscheduled event due to ‘timeout’ I have information on the scheduled event that occurred just before this timeout, to get the duration between scheduled and unscheduled.

From the below SELECT I obtain different combinations of scheduled with the other unscheduled events by a timeout:

SELECT
    scheduled.id as scheduled_id, 
    scheduled.created as scheduled_time, 
    scheduled.event as scheduled_event, 
    scheduled.delivery_id as delivery_id,
    unscheduled.id as unscheduled_id, 
    unscheduled.created as unscheduled_time, 
    unscheduled.event as unscheduled_event, 
    unscheduled.extra as extra
FROM
    delivery_event scheduled_event
JOIN
    delivery_event unscheduled_event ON scheduled.delivery_id = 10005 
    AND unscheduled.delivery_id = 10005 
    AND unscheduled.event = 3 
    AND scheduled.event = 2 
    AND scheduled.created < unscheduled.created
    AND unscheduled.extra->>'timeout'
scheduled_id scheduled_time scheduled_event delivery_id unscheduled_id unscheduled_time scheduled_event extra
5 2022-10-27 21:15 2 10005 6 2022-10-27 22:40 3 {"timeout"}
3 2022-10-27 19:20 2 10005 6 2022-10-27 22:40 3 {"timeout"}
1 2022-10-27 18:04 2 10005 6 2022-10-27 22:40 3 {"timeout"}
3 2022-10-27 19:20 2 10005 4 2022-10-27 20:30 3 {"timeout"}
1 2022-10-27 18:04 2 10005 4 2022-10-27 20:30 3 {"timeout"}

but instead I’d like to only have the following result:

scheduled_id scheduled_time scheduled_event delivery_id unscheduled_id unscheduled_time scheduled_event extra
5 2022-10-27 21:15 2 10005 6 2022-10-27 22:40 3 {"timeout"}
3 2022-10-27 19:20 2 10005 4 2022-10-27 20:30 3 {"timeout"}

only the scheduled events that occurred right before the unscheduled ones, i.e., the last scheduled event before each unscheduled event due to timeout.

2

Answers


  1. Try this :

    SELECT a.*
    FROM (
    SELECT delivery_id
         , lag(id, 1) OVER w AS scheduled_id
         , lag(created, 1) OVER w AS scheduled_time
         , lag(event , 1) OVER w AS scheduled_event
         , id AS unscheduled_id
         , created AS unscheduled_time
         , event AS unscheduled_event
         , extra
      FROM delivery_event
     WHERE delivery_id = 10005
    WINDOW w AS (PARTITION BY delivery_id ORDER BY created ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
    ) AS a
    WHERE a.scheduled_event = 2
      AND a.unscheduled_event = 3
      AND a.extra = '{"timeout"}' :: json
    
    Login or Signup to reply.
  2. Two solutions I can think of for this. The easiest would probably be to use a distinct on clause in order to drop all rows with duplicate unscheduled_id with an order by clause on scheduled.created in order to ensure you get back the correct row.

    SELECT DISTINCT ON (unscheduled.id)
        scheduled.id as scheduled_id, 
        scheduled.created as scheduled_time, 
        scheduled.event as scheduled_event, 
        scheduled.delivery_id as delivery_id,
        unscheduled.id as unscheduled_id, 
        unscheduled.created as unscheduled_time, 
        unscheduled.event as unscheduled_event, 
        unscheduled.extra as extra
    FROM
        delivery_event scheduled_event
    JOIN
        delivery_event unscheduled_event ON scheduled.delivery_id = 10005 
        AND unscheduled.delivery_id = 10005 
        AND unscheduled.event = 3 
        AND scheduled.event = 2 
        AND scheduled.created < unscheduled.created
        AND unscheduled.extra->>'timeout'
    ORDER BY scheduled.created DESC
    

    An alternative solution would be to use a left lateral join such that only the latest scheduled event will be joined.

     SELECT
        scheduled.id as scheduled_id, 
        scheduled.created as scheduled_time, 
        scheduled.event as scheduled_event, 
        scheduled.delivery_id as delivery_id,
        unscheduled.id as unscheduled_id, 
        unscheduled.created as unscheduled_time, 
        unscheduled.event as unscheduled_event, 
        unscheduled.extra as extra
    FROM
        delivery_event unscheduled_event
    JOIN
        LEFT JOIN LATERAL (
            SELECT scheduled.id, 
            scheduled.created, 
            scheduled.event, 
            scheduled.delivery_id
            FROM delivery_event scheduled
            WHERE scheduled.delivery_id = 10005 
            AND scheduled.event = 2 
            AND scheduled.created < unscheduled.created
            ORDER BY scheduled.created DESC 
        LIMIT 1) scheduled ON TRUE 
    WHERE unscheduled.delivery_id = 10005 
        AND unscheduled.event = 3 
        AND unscheduled.extra->>'timeout'
    

    One note of warning on joining tables like this. Although joining with only one relevant row is very convenient it tends to come with significant performances drawbacks, so make sure to check if the performance of either of these methods are satisfactory for your particular circumstances.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search