Postgresql - Use SQL to get session duration between two events

Greg
October 20, 2023
210 views
0 votes
2 Answers

I’ve got a dataset (in postgres specifically) that I need to calculate the ‘user session time’ for. I’m fairly positive I need to use partitioning to answer the question, but I’m not really sure how to go about it. The events table as an id, a user_id, the event category, and a date. When a user signs in, we put an event in the table for that user. Each time they visit a page, there’s a similar event. The table looks similar to this:

id	user_id	category	date
1	100	Sign-In	2023-12-20 01:07:03
2	100	Page View	2023-12-20 01:07:05
3	100	Page View	2023-12-20 01:09:17
4	107	Sign-In	2023-12-20 01:09:19
5	107	Page View	2023-12-20 01:09:21
6	107	Page View	2023-12-20 01:09:50
7	100	Page View	2023-12-20 01:11:28
..	..	..	..
784	100	Sign-In	2023-12-21 17:37:31

To calculate the session time from user 100, let’s assume that event 7 is their last page view in the system for that session. We need to get the Sign-In event (id 1), find the last event before the Sign-In (id 7), and subtract the two times. That’d be 2023-12-20 01:11:28 - 2023-12-20 01:07:03, or 4 minutes and 25 seconds. I’d want to do this for each user/sign-in combination in the dataset. For completeness, it’s possible that the user doesn’t have any Page Views, in which case their session time is zero. The results should look something like this:

user_id	session_start	duration
100	2023-12-20 01:07:03	00:04:25
100	2023-12-21 17:37:31	null (or 00:00:00)
107	2023-12-20 01:09:19	00:00:31

My guess is that I want to partition by user, and then by the event category, but I need the row before the one selected potentially, and not the next window. In code, I’d normally just take all the events, order them by user then by duration, and start to build a new result set by just iterating through and creation a ‘session’ each time I have a sign-in event. I’m just curious if this could be done in SQL on the database more efficiently.

Thanks for any help or pointers!

Answers

if you assemble the queries in a CTE something like the following should work :-

with sign_in_table as ( 
  select user_id, date
  from events 
  where category = 'Sign-in' ),

last_activity_table as ( 
  select user_id , max(date) as date
  from events
  where category <> 'Sign-in'
  group by user) 

select user_id, 
       sign_in_table.date as session_start, 
       last_activity_table.date - sign_in_table.date as duration
from sign_in_table
left join last_activity_table on (user_id)

- Zegarek
- October 20, 2023 at 8:04 pm
- 0 votes
0
You can trade the window function for a clunky scalar subquery if you find it more familiar or intuitive. After all, from each Sign-In row, you’re just looking ahead for maximum date for the same user, that’s before the next Sign-In: demo
```
select user_id,
       date as sign_in_time,
       (select max(date) from test t2 
        where t2.user_id=t1.user_id
        and t2.date>=t1.date
        and category<>'Sign-In'
        and date<=(
            select coalesce(min(t3.date),'infinity'::timestamp) 
            from test t3 where category='Sign-In'
            and t3.date>t1.date and t3.user_id=t1.user_id)
       )-date as session_duration
from test t1
where category='Sign-In'
order by 1,2;
```
user_id sign_in_time session_duration

100 2023-12-20 01:07:03+00 00:04:25

100 2023-12-21 17:37:31+00 null

107 2023-12-20 01:09:19+00 00:00:31
Login or Signup to reply.

user_id	sign_in_time	session_duration
100	2023-12-20 01:07:03+00	00:04:25
100	2023-12-21 17:37:31+00	null
107	2023-12-20 01:09:19+00	00:00:31

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Use SQL to get session duration between two events

Answers