skip to Main Content

I have the following sql query that is returning time spent logged in for users

select sum(duration) as time_spent, date_trunc('day', date)::date as date
  from user_activities
  where name = 'session'
  and user_id = '#{user_id}'
  group by date
  order by 2

It’s working mostly fine, but I found some edge cases where users were logged in on a given day for more than 24 hours. When I dug into this, I discovered that one user had the following start/end time in their sessions on a given day which summed to more than 24hrs:

[2023-04-24 00:00:54.457183 UTC, 2023-04-24 00:02:01.388673 UTC],                             
[2023-04-24 00:13:22.536908 UTC, 2023-04-24 18:38:02.108004 UTC],                             
[2023-04-24 23:35:56.81623 UTC, 2023-04-25 20:24:20.60667 UTC]

so they were logged in at midnight for two minutes, logged in again from midnight to 6pm the next day and then again at 11pm until 8pm the next day. the query is summing up all of those times (which in fact does make sense, that’s what i’m asking it to do), but I’d like instead to sum up time from midnight to midnight so that this would make a duration of (approximately) 2 minutes (first session) + 1105 minutes (second session) + 25 minutes (last session, up to midnight of the given day).

The relevant columns of the user_activities table looks like this:

:name=>"session",                                                                                    
:user_id=>"xxx",                                                                                                               
:start_time=>2023-04-24 23:35:56.81623 UTC,                                                       
:end_time=>2023-04-25 20:24:20.60667 UTC,                                                         
:duration=>1,                                                                                      
:date=>#<Date: 2023-04-30 ((2460065j,0s,0n),+0s,2299161j)>,                                        

The research I’ve done makes me thing I can use between (such as here) but i haven’t found a way to make that work while using date_trunc. I’m sure there is a way, but I’m a novice sql user at best and this is really giving me trouble. Thanks for any help!

2

Answers


  1. The query below shows sample data of your user adding an other user with a session over two days (to demonstrate the principle)

    The subquery day_cnt calculates the minimal start date of the sessions and the count_days that is covered with the sessions.

    This is used in subquery cal to generate a list of all dates in your data.

    This list of the dates is used to make the midnight split in the join in dt2, where also the corrected start and end timestamps are calculated (suffixed with 2)

    Finally the extract ( epoch is used to get the duration in minutes of the splitted sessions.

    Query with Sample Data

    with dt as ( 
    select * from (values
    ('xxx', '2023-04-24 00:00:54.457183 +02'::TIMESTAMPTZ, '2023-04-24 00:02:01.388673 +02'::TIMESTAMPTZ),                             
    ('xxx', '2023-04-24 00:13:22.536908 +02'::TIMESTAMPTZ, '2023-04-24 18:38:02.108004 +02'::TIMESTAMPTZ),                             
    ('xxx', '2023-04-24 23:35:56.81623 +02'::TIMESTAMPTZ, '2023-04-25 20:24:20.60667 +02'::TIMESTAMPTZ),
    ('yyy', '2023-04-24 23:35:00.00000 +02'::TIMESTAMPTZ, '2023-04-26 20:00:00.00000 +02'::TIMESTAMPTZ)
    ) dt(user_id, start_time, end_time)
    ),
    day_cnt as (
    select min(start_time::date) start_date, (max(end_time::date) - min(start_time::date))+1 count_days from dt),
    cal as (
    select  start_date + generate_series( 1, count_days)-1 cal_date
    from day_cnt),
    dt2 as (
    select dt.*, cal.cal_date,
    greatest(dt.start_time,cal.cal_date::TIMESTAMPTZ) start_time2,
    least(dt.end_time, (cal.cal_date+1)::TIMESTAMPTZ) end_time2
    from dt
    join cal on cal.cal_date between dt.start_time::date and dt.end_time::date
    )
    select dt2.*,
    round(extract(epoch from (end_time2 - start_time2) )/60) duration
    from dt2
    

    Output

    user_id|start_time                   |end_time                     |cal_date  |start_time2                  |end_time2                    |duration|
    -------+-----------------------------+-----------------------------+----------+-----------------------------+-----------------------------+--------+
    xxx    |2023-04-24 00:00:54.457183+02|2023-04-24 00:02:01.388673+02|2023-04-24|2023-04-24 00:00:54.457183+02|2023-04-24 00:02:01.388673+02|       1|
    xxx    |2023-04-24 00:13:22.536908+02|2023-04-24 18:38:02.108004+02|2023-04-24|2023-04-24 00:13:22.536908+02|2023-04-24 18:38:02.108004+02|    1105|
    xxx    | 2023-04-24 23:35:56.81623+02| 2023-04-25 20:24:20.60667+02|2023-04-24| 2023-04-24 23:35:56.81623+02|       2023-04-25 00:00:00+02|      24|
    xxx    | 2023-04-24 23:35:56.81623+02| 2023-04-25 20:24:20.60667+02|2023-04-25|       2023-04-25 00:00:00+02| 2023-04-25 20:24:20.60667+02|    1224|
    yyy    |       2023-04-24 23:35:00+02|       2023-04-26 20:00:00+02|2023-04-24|       2023-04-24 23:35:00+02|       2023-04-25 00:00:00+02|      25|
    yyy    |       2023-04-24 23:35:00+02|       2023-04-26 20:00:00+02|2023-04-25|       2023-04-25 00:00:00+02|       2023-04-26 00:00:00+02|    1440|
    yyy    |       2023-04-24 23:35:00+02|       2023-04-26 20:00:00+02|2023-04-26|       2023-04-26 00:00:00+02|       2023-04-26 20:00:00+02|    1200|
    

    The actual aggregation is left out as an exercise.

    Login or Signup to reply.
  2. The following query returns the sum of the daily session times in seconds:

    SELECT SUM(EXTRACT(EPOCH FROM (  LEAST(ua.end_time, d.session_day + INTERVAL '1' DAY)
                                   - GREATEST(ua.start_time, d.session_day)))) AS daily_time_seconds,
           d.session_day
      FROM user_activities ua
      CROSS JOIN generate_series(start_time::date, end_time::date, INTERVAL '1' DAY) d(session_day)
     WHERE name = 'session'
       AND user_id = '#{user_id}'
     GROUP BY d.session_day
     ORDER BY d.session_day;
    

    This query depends on all timestamps being in the same time zone. If mixed time zones are used, then the query should be modified to translate each timestamp to a common time zone.

    A few notes regarding issues in the original question:

    1. Avoid using keyword or built-in function names as column identifiers (e.g., ‘DATE’).
    2. Use names instead of ordinals in GROUP and ORDER BY clauses to improve readability and maintainability.
    3. Use bound parameters instead of string interpolation to minimize the risk of SQL injection attacks.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search