SQL ranking based on condition - Amazon Web Sevices

user20391531
April 20, 2023
213 views
2 votes
3 Answers

I am trying to add another column called session_id. I want to rank according to the condition that if the time difference between the date_time is more than 30 minutes, then that will be counted as another session. Here is an example of what I am trying to do:

date_diff	date_time	session_id
0	2023-01-18 00:01:40.000000	1
0	2023-01-18 00:01:42.000000	1
0	2023-01-18 00:01:46.000000	1
93	2023-01-18 01:34:38.000000	2
0	2023-01-18 01:34:38.000000	2
27	2023-01-18 02:01:59.000000	2
1	2023-01-18 02:02:00.000000	2
89	2023-01-18 03:31:40.000000	3

So whenever, date_diff in minutes is more than 30, that will be categorized as a new session.

Answers

- Josh
- April 20, 2023 at 5:26 pm
- 0 votes
0
There might be a better way to do this in Redshift, which I don’t have, but you might try something like this:
```
SELECT Datetime, date_diff, 
  SUM(CASE WHEN date_diff > 30 THEN 1 ELSE 0 END) OVER (ORDER BY Datetime) AS group_id
FROM your_table
```
This simply flags the rows > 30 with a 1, and then the OVER() clause will sort and sum which would create the ordered session_id you’re looking for.
Login or Signup to reply.

- GMB
- April 20, 2023 at 5:27 pm
- 0 votes
0
One option uses a conditional window sum:
```
select t.*,
    1 + sum(case when date_diff > 30 then 1 else 0 end) 
        over(order by date_time) session_id
from mytable
```
If you wanted to compute the date difference on the fly from the timestamp column, we would use lag() first:
```
select t.*,
    1 + sum(case when datediff(minute, lag_date_time, date_time) > 30 then 1 else 0 end) 
        over(order by date_time) session_id
from (
    select t.*, lag(date_time, 1, date_time) over(order by date_time) lag_date_time
    from mytable t
) t
```
Login or Signup to reply.

- JuanChamie
- April 20, 2023 at 5:49 pm
- 0 votes
0
You can achieve this using window functions in SQL. Assuming you have a table called activity with the columns date_diff and date_time, you can use the following query to calculate the session_id:
WITH time_diffs AS ( SELECT *, LAG(date_time) OVER (ORDER BY date_time) AS prev_date_time FROM activity ), flagged_sessions AS ( SELECT *, CASE WHEN EXTRACT(EPOCH FROM (date_time - prev_date_time)) / 60 > 30 THEN 1 ELSE 0 END AS new_session_flag FROM time_diffs ), session_ids AS ( SELECT *, SUM(new_session_flag) OVER (ORDER BY date_time) + 1 AS session_id FROM flagged_sessions ) SELECT date_diff, date_time, session_id FROM session_ids ORDER BY date_time;
In this query:

We first calculate the time difference between the current row and the previous row using the LAG window function in the time_diffs CTE.
Then, we create a new_session_flag column in the flagged_sessions CTE, which is 1 if the time difference is more than 30 minutes, and 0 otherwise.
Finally, we calculate the session_id by taking the cumulative sum of the new_session_flag column, and adding 1 to it in the session_ids CTE.
The final result is selected from the session_ids CTE and ordered by date_time.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

SQL ranking based on condition – Amazon Web Sevices

Answers