I have the next table that stores events:
(simplified structure)
ID | User | Action | Timestamp |
---|---|---|---|
12 | user1 | END | 2022-01-01 05:00 |
43 | user1 | START | 2022-01-01 04:00 |
54 | user1 | END | 2022-01-01 03:00 |
13 | user1 | START | 2022-01-01 02:00 |
I need to join 2 events in one row, so any START event is accompanied by the END event that comes after that.
So the result should be the next:
ID1 | ID2 | User | Start Timestamp | End Timestamp |
---|---|---|---|---|
13 | 54 | user1 | 2022-01-01 02:00 | 2022-01-01 03:00 |
43 | 12 | user1 | 2022-01-01 04:00 | 2022-01-01 05:00 |
Ideally, it should not have to many performance issues, as there could be a lot of records in the table.
I’ve tried the next query:
select
s.id as "ID1",
e.id as "ID2",
s.user,
s.time as "Start Time",
e.time as "End Time"
from Events s
left join Events e on s.user = e.user
where s.action = 'START'
and e.action = 'END'
and s.timestamp < e.timestamp
but it will also match the record 13 to record 12.
Is it possible to join the left side to right only once? (keeping in mind that is should be the next END record time-wise?
Thanks
6
Answers
Here is a PostgreSQL solution using lateral join. It might be working on HANA as no Postgres-specific features are used. The internal query selects the ‘END’ action for the same user that occurred soonest after the corresponding ‘START’. Events that have started but not finished yet will have NULL values for "ID2" and "End timestamp".
We want to get the nearest timestamp of the
END
event for eachSTART
event.I would go with the following approach:
START
events.END
event using thetimedelta
.Assumptions
START
event, the timestamps will be unique. (Same goes forEND
event.The issue with your query above is that for each start event, there can be multiple end events, which occur after. However, you would like to choose the one that’s ‘closest’ to the start event. You can achieve this by adding an additional aggregation.
Please find a HANA example (uses no HANA specific functionality):
If you need to have
E.ID
included, you will need to join it back to the result set. Note, that there may be multiple end events with the same timestamp, which you need to handle when joining backE.ID
.If you additionally would like to include
START
events without correspondingEND
event, you can use the following:You can use the window function Lead.
One way is a lateral join that picks the smallest "end" timestamp that is greater than the "start" timestamp:
The above is standard ANSI SQL and works (at least) in Postgres.
In Postgres I would create an index on
events ("user", "timestamp") where action = 'END'
to make the lateral query fast.Solution tested in HANA SQL
Same query but excluding the records that are not the min duration