I am trying to add another column called session_id. I want to rank according to the condition that if the time difference between the date_time is more than 30 minutes, then that will be counted as another session. Here is an example of what I am trying to do:
date_diff | date_time | session_id |
---|---|---|
0 | 2023-01-18 00:01:40.000000 | 1 |
0 | 2023-01-18 00:01:42.000000 | 1 |
0 | 2023-01-18 00:01:46.000000 | 1 |
93 | 2023-01-18 01:34:38.000000 | 2 |
0 | 2023-01-18 01:34:38.000000 | 2 |
27 | 2023-01-18 02:01:59.000000 | 2 |
1 | 2023-01-18 02:02:00.000000 | 2 |
89 | 2023-01-18 03:31:40.000000 | 3 |
So whenever, date_diff in minutes is more than 30, that will be categorized as a new session.
3
Answers
There might be a better way to do this in Redshift, which I don’t have, but you might try something like this:
This simply flags the rows > 30 with a 1, and then the
OVER()
clause will sort and sum which would create the ordered session_id you’re looking for.One option uses a conditional window sum:
If you wanted to compute the date difference on the fly from the timestamp column, we would use
lag()
first:You can achieve this using window functions in SQL. Assuming you have a table called activity with the columns date_diff and date_time, you can use the following query to calculate the session_id:
In this query:
We first calculate the time difference between the current row and the previous row using the LAG window function in the time_diffs CTE.
Then, we create a new_session_flag column in the flagged_sessions CTE, which is 1 if the time difference is more than 30 minutes, and 0 otherwise.
Finally, we calculate the session_id by taking the cumulative sum of the new_session_flag column, and adding 1 to it in the session_ids CTE.
The final result is selected from the session_ids CTE and ordered by date_time.