skip to Main Content

I have the next table that stores events:
(simplified structure)

ID User Action Timestamp
12 user1 END 2022-01-01 05:00
43 user1 START 2022-01-01 04:00
54 user1 END 2022-01-01 03:00
13 user1 START 2022-01-01 02:00

I need to join 2 events in one row, so any START event is accompanied by the END event that comes after that.

So the result should be the next:

ID1 ID2 User Start Timestamp End Timestamp
13 54 user1 2022-01-01 02:00 2022-01-01 03:00
43 12 user1 2022-01-01 04:00 2022-01-01 05:00

Ideally, it should not have to many performance issues, as there could be a lot of records in the table.

I’ve tried the next query:

select 
  s.id as "ID1",
  e.id as "ID2",
  s.user,
  s.time as "Start Time",
  e.time as "End Time"
from Events s
left join Events e on s.user = e.user
where s.action = 'START'
  and e.action = 'END'
  and s.timestamp < e.timestamp

but it will also match the record 13 to record 12.
Is it possible to join the left side to right only once? (keeping in mind that is should be the next END record time-wise?

Thanks

6

Answers


  1. Here is a PostgreSQL solution using lateral join. It might be working on HANA as no Postgres-specific features are used. The internal query selects the ‘END’ action for the same user that occurred soonest after the corresponding ‘START’. Events that have started but not finished yet will have NULL values for "ID2" and "End timestamp".

    create temporary table the_table(id integer, usr text, action text, ts timestamp);
    insert into the_table values 
    (12,'user1','END','2022-01-01 05:00'),(43,'user1','START','2022-01-01 04:00'),
    (54,'user1','END','2022-01-01 03:00'),(13,'user1','START','2022-01-01 02:00');
    
    select tx.id as "ID1", l.id as "ID2", tx.usr as "User", 
           tx.ts as "Start timestamp", l.ts as "End timestamp" 
    from the_table as tx
    left join lateral 
    (
      select ti.id, ti.ts
      from the_table as ti 
      where ti.action = 'END' 
        and ti.ts > tx.ts 
        and ti.usr = tx.usr
      order by ti.ts - tx.ts
      limit 1
    ) as l on true  
    where tx.action = 'START'
    order by "Start timestamp"; 
    
    Login or Signup to reply.
  2. We want to get the nearest timestamp of the END event for each START event.

    I would go with the following approach:

    1. Get the minimum greater than zero timestamp difference for each of the START events.
    2. Now find the actual END event using the timedelta.

    Assumptions

    1. At max we can have only one event which is not ended yet!
    2. For every START event, the timestamps will be unique. (Same goes for END event.
    WITH closest_to_start AS (
        SELECT 
            s.id,
            MIN(TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp)) AS min_delta
        FROM Events AS s
        INNER JOIN Events AS e ON s.user = e.user
        WHERE s.action = 'START'
        AND e.action = 'END'
        GROUP BY s.id
        HAVING min_delta >= 0
    )
    SELECT s.id, 
        e.id
    FROM Events AS s
    OUTER JOIN closest_to_start ON closest_to_start.id = s.id
    OUTER JOIN Events AS e ON e.id = s.id
    WHERE s.action = 'START'
    AND e.action = 'END'
    AND 
    (
        e.timestamp IS NULL
        OR
        TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp) = closest_to_start.min_delta
    )
    
    Login or Signup to reply.
  3. The issue with your query above is that for each start event, there can be multiple end events, which occur after. However, you would like to choose the one that’s ‘closest’ to the start event. You can achieve this by adding an additional aggregation.

    Please find a HANA example (uses no HANA specific functionality):

    CREATE TABLE TEST (ID integer, USER NVARCHAR(20), ACTION NVARCHAR(20), TIMESTAMP DATETIME)
    
    INSERT INTO TEST VALUES (12, 'user1', 'END', '2022-01-01 05:00')
    INSERT INTO TEST VALUES (43, 'user1', 'START', '2022-01-01 04:00')
    INSERT INTO TEST VALUES (54, 'user1', 'END', '2022-01-01 03:00')
    INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 02:00')
    
    INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')
    
    SELECT
        S.ID ID1,
        S.USER,
        S.ACTION,
        S.TIMESTAMP START_TIME,
        MIN(E.TIMESTAMP) END_TIME
    FROM TEST S
    JOIN TEST E ON (
        s.USER = e.USER AND
        s.ACTION = 'START' AND
        e.ACTION = 'END' AND
        e.TIMESTAMP >= s.TIMESTAMP
    )
    GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP
    

    If you need to have E.ID included, you will need to join it back to the result set. Note, that there may be multiple end events with the same timestamp, which you need to handle when joining back E.ID.

    If you additionally would like to include START events without corresponding END event, you can use the following:

    INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')
    
    SELECT
        S.ID ID1,
        S.USER,
        S.ACTION,
        S.TIMESTAMP START_TIME,
        MIN(E.TIMESTAMP) END_TIME
    FROM TEST S
    LEFT JOIN TEST E ON (
        s.USER = e.USER AND
        e.ACTION = 'END' AND
        e.TIMESTAMP >= s.TIMESTAMP
    )
    WHERE s.ACTION ='START'
    GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP
    
    Login or Signup to reply.
  4. You can use the window function Lead.

    with Daten 
    as
    (
    Select 12 as ID, 'user1' as Benutzer, 'END' as action, '05:00' as Time
    Union
    Select 43 as ID, 'user1' as Benutzer, 'Start' as action, '04:00' as Time
    Union
    Select 54 as ID, 'user1' as Benutzer, 'END' as action, '03:00' as Time
    Union
    Select 13 as ID, 'user1' as Benutzer, 'Start' as action, '02:00' as Time
    )
    Select 
        *
    from
        (
            Select
                *,
                lead(ID,1) over (order by number) as ID2,
                lead(action,1) over (order by number) as action2,
                lead(time,1) over (order by number) as time2
            from
                (
                    Select 
                        *,
                        ROW_NUMBER() OVER(ORDER BY Benutzer,Time,action) as number
                    from
                         Daten
                ) x
        ) y
    where y.action = 'Start'
    
    Login or Signup to reply.
  5. One way is a lateral join that picks the smallest "end" timestamp that is greater than the "start" timestamp:

    select st.id as id1,
           en.id as id2,
           st."timestamp" as start_timestamp,
           en."timestamp" as end_timestamp
    from events st
      left join lateral (
          select id, "timestamp"
          from events e
          where e."user" = st."user" 
            and e.action = 'END'
          and e.timestamp >= st.timestamp
          order by "timestamp" 
          fetch first 1 row only
      ) en on true
    where st.action = 'START';
    

    The above is standard ANSI SQL and works (at least) in Postgres.

    In Postgres I would create an index on events ("user", "timestamp") where action = 'END' to make the lateral query fast.

    Login or Signup to reply.
  6. Solution tested in HANA SQL

    Same query but excluding the records that are not the min duration

    CREATE TABLE "TESTSCHEMA"."EVENTS" (ID integer, "user" NVARCHAR(20), "action" NVARCHAR(20), "timestamp" SECONDDATE);
    
    INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (12, 'user1', 'END', '2022-01-01 05:00');
    INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (43, 'user1', 'START', '2022-01-01 04:00');
    INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (54, 'user1', 'END', '2022-01-01 03:00');
    INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 02:00');
    INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 09:00');
    
    SELECT "ID1","ID2","Start Time","End Time" FROM
    (
     select 
     ROW_NUMBER() OVER(PARTITION BY s."ID" order by SECONDS_BETWEEN(e."timestamp",s."timestamp") DESC) AS RN,
      s."ID" as "ID1",
      e."ID" as "ID2",
      s."user",
      s."timestamp" as "Start Time",
      e."timestamp" as "End Time",
      SECONDS_BETWEEN(e."timestamp",s."timestamp") AS "Duration"
      from "TESTSCHEMA"."EVENTS" s
    left join "TESTSCHEMA"."EVENTS" e on s."user" = e."user"
    where s."action" = 'START'
      and e."action" = 'END'
      and s."timestamp" < e."timestamp"
    )WHERE RN=1
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search