I’m trying to detect users first sessions.
ID | event_server_date | Event | Row_number |
---|---|---|---|
1 | 2022-10-26 09:43 | abc | 1 |
1 | 2022-10-26 09:45 | cde | 2 |
1 | 2022-10-26 09:47 | ykz | 3 |
1 | 2022-10-26 09:48 | fun | 4 |
1 | 2022-10-26 09:50 | start_event | 5 |
1 | 2022-10-26 09:55 | x | 6 |
1 | 2022-10-26 09:56 | y | 7 |
1 | 2022-10-26 09:56 | z | 8 |
2 | 2022-10-26 09:12 | plz | 1 |
2 | 2022-10-26 09:15 | rck | 2 |
2 | 2022-10-26 09:15 | dsp | 3 |
2 | 2022-10-26 09:17 | vnl | 4 |
2 | 2022-10-26 09:23 | start_event | 5 |
2 | 2022-10-26 09:23 | k | 6 |
2 | 2022-10-26 09:26 | l | 7 |
Desired Output:
ID | Timestamp | Event | Row_number |
---|---|---|---|
1 | 2022-10-26 09:50 | start_event | 5 |
1 | 2022-10-26 09:55 | x | 6 |
1 | 2022-10-26 09:56 | y | 7 |
1 | 2022-10-26 09:56 | z | 8 |
2 | 2022-10-26 09:23 | start_event | 5 |
2 | 2022-10-26 09:23 | k | 6 |
2 | 2022-10-26 09:26 | l | 7 |
Real timestamp column looks like this: 1970-01-20 06:57:25.583738 UTC
I’m using event based data and my table is quite large.
Is there any way for me pick those desired rows only? And discard all events before start_event in every partition.
I’ve got this far but I have no idea how to discard unwanted events for every partition.
SELECT ID, event_server_date , Event,
row_number() over(partition by ID ORDER BY event_server_date ASC) AS Row_number
FROM `my_table`
ORDER BY event_server_date ASC
Note: I have been using SQL for only two months so I might not know the concepts you’re talking about.
2
Answers
We can use
COUNT()
as an analytic function along with a rolling sum:The count generated above will be one or greater once a row is encountered having
start_event
. We only retain rows late enough for this condition to be true.I would do it with
LAST_VALUE()
analytic function andQUALIFY
You can ignore
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
because it’s default window in analytic functions but I like it to leave it for the sake of clarity.