skip to Main Content

I’m trying to detect users first sessions.

ID event_server_date Event Row_number
1 2022-10-26 09:43 abc 1
1 2022-10-26 09:45 cde 2
1 2022-10-26 09:47 ykz 3
1 2022-10-26 09:48 fun 4
1 2022-10-26 09:50 start_event 5
1 2022-10-26 09:55 x 6
1 2022-10-26 09:56 y 7
1 2022-10-26 09:56 z 8
2 2022-10-26 09:12 plz 1
2 2022-10-26 09:15 rck 2
2 2022-10-26 09:15 dsp 3
2 2022-10-26 09:17 vnl 4
2 2022-10-26 09:23 start_event 5
2 2022-10-26 09:23 k 6
2 2022-10-26 09:26 l 7

Desired Output:

ID Timestamp Event Row_number
1 2022-10-26 09:50 start_event 5
1 2022-10-26 09:55 x 6
1 2022-10-26 09:56 y 7
1 2022-10-26 09:56 z 8
2 2022-10-26 09:23 start_event 5
2 2022-10-26 09:23 k 6
2 2022-10-26 09:26 l 7

Real timestamp column looks like this: 1970-01-20 06:57:25.583738 UTC
I’m using event based data and my table is quite large.

Is there any way for me pick those desired rows only? And discard all events before start_event in every partition.

I’ve got this far but I have no idea how to discard unwanted events for every partition.

SELECT ID, event_server_date , Event,
row_number() over(partition by ID ORDER BY event_server_date ASC) AS Row_number

FROM `my_table` 

ORDER BY event_server_date ASC

Note: I have been using SQL for only two months so I might not know the concepts you’re talking about.

2

Answers


  1. We can use COUNT() as an analytic function along with a rolling sum:

    WITH cte AS (
        SELECT *, COUNT(CASE WHEN Event = 'start_event' THEN 1 END)
                      OVER (PARTITION BY ID ORDER BY event_server_date) cnt,
                  ROW_NUMBER() OVER (PARTITION BY makro_id
                                     ORDER BY event_server_date) AS Row_number
        FROM my_table
    )
    
    SELECT ID, event_server_date, Event, Row_number
    FROM cte
    WHERE cnt > 0
    ORDER BY ID, event_server_date;
    

    The count generated above will be one or greater once a row is encountered having start_event. We only retain rows late enough for this condition to be true.

    Login or Signup to reply.
  2. I would do it with LAST_VALUE() analytic function and QUALIFY

    SELECT * 
    FROM my_table
    QUALIFY LAST_VALUE(IF(event = 'start_event', TRUE, NULL) IGNORE NULLS) 
                 OVER (PARTITION BY ID ORDER BY event_server_date 
                       ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    

    You can ignore ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW because it’s default window in analytic functions but I like it to leave it for the sake of clarity.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search