skip to Main Content

I have a table called activity that contains values like the following:

userId |      timestamp      | action
----------------------------------------
1      | 2022-10-18 10:00:00 | OPEN
2      | 2022-10-18 10:20:00 | OPEN
1      | 2022-10-18 10:05:00 | CLOSE
2      | 2022-10-18 10:22:00 | CLOSE
...

I want to be able to get the difference between the open and close time per user for a given day like so:

desiredTable
------------------------------------------------------
userId |     start_time      | time_elapsed_in_minutes
------------------------------------------------------
1      | 2022-10-18 10:00:00 | 5
2      | 2022-10-18 10:20:00 | 2

A couple of things to note are:

  1. There is no guarantee OPEN and CLOSE rows will be back to back to each other as the table also holds a lot of other action types.
  2. Also, there is no guarantee that there will be the same number of
    OPEN and CLOSE rows due to network conditions that could result in either not being reported. I.e: user 1 can have 3 opens and 1 close, so only 1 pair needs to be calculated.

My approach:

  • create table of OPEN and CLOSE counts per user
  • userId | # opens | # closes
  • determine which number is less (# opens or # closes) and get that # of rows from the activity table
  • opens table with the # of rows determined above
    • userId | timestamp
  • closes table with the # of rows determined above
    • userId | timestamp
  • subtract close timestamp from open timestamp group by userId and truncate to minutes

Any help would be greatly appreciated!

3

Answers


  1. We group every action = 'close' with the action = 'open' that comes before it. Then we group by id and choose the timestamps.

    select    userid
             ,min(timestamp)                as start_time
             ,max(timestamp)-min(timestamp) as time_elapsed_in_minutes
    from     (
              select  *
                      ,count(case action when 'OPEN' then 1 end) over(partition by userid order by timestamp) as grp
              from   t
              where  action in('OPEN', 'CLOSE')
              ) t
    group by  userid, grp
    
    userid start_time time_elapsed_in_minutes
    1 2022-10-18 10:00:00 00:05:00
    2 2022-10-18 10:20:00 00:02:00

    Fiddle

    Login or Signup to reply.
  2. We can use lead() when next action is closed.

    select    * 
    from     (
              select   userid
                      ,timestamp                                                                                                                                                   as start_time
                      ,case lead(action) over(partition by userid order by timestamp) when 'CLOSE' then lead(timestamp) over(partition by userid order by timestamp)-timestamp end as time_elapsed_in_minutes
              from   t
              where  action in('OPEN', 'CLOSE')
              ) t
    where     time_elapsed_in_minutes is not null
    
    userid start_time time_elapsed_in_minutes
    1 2022-10-18 10:00:00 00:05:00
    2 2022-10-18 10:20:00 00:02:00

    Fiddle

    Login or Signup to reply.
  3. As suggested by @Robert Hamilton this can be directly translated into a query using window functions:

    select  userId, 
            start_time, 
            floor(extract('epoch' from duration)/60) duration_in_minutes
    from (
        select  userId,
                timestamp as start_time,
                case when action='CLOSE' and (lag(action)    over w1)='OPEN'
                     then timestamp        - (lag(timestamp) over w1)
                end as duration
      from activity
      window w1 as (partition by userId order by timestamp)) a
    where duration is not null;
    

    Fiddle.

    But the same thing can be achieved using very basic constructs – I used CTEs below only for readability:

    with 
    "nearest_close_after_each_open" as 
        ( select 
            open.userId,
            open.timestamp as start_time,
            min(close.timestamp) as end_time 
          from activity as close
            inner join activity as open
                on open.action='OPEN' and close.action='CLOSE'
                and open.userId=close.userId
                and open.timestamp < close.timestamp 
          group by 1,2),
    "longest_open_window_before_a_close" as
        ( select 
            userId, 
            end_time,
            min(start_time) as start_time 
          from nearest_close_after_each_open
          group by 1,2),
    "shortest_open_window_before_a_close" as
        ( select 
            userId, 
            end_time,
            max(start_time) as start_time 
          from nearest_close_after_each_open
          group by 1,2)  
    select 
      userId,
      start_time,
      end_time,
      floor(extract('epoch' from end_time-start_time )/60) as time_elapsed_in_minutes
    from "shortest_open_window_before_a_close";
    --from "longest_open_window_before_a_close";
    

    Fiddle.

    I’d normally expect all following OPENs in a sequence to be retransmissions of the first one, similar to what you assume about CLOSEs, which is why I also added longest_open_window_before_a_close – pairing up the earliest recorded OPEN before a CLOSE. By default, the code uses shortest_open_window_before_a_close of your choice.

    One unaddressed detail I spotted is

    difference between the open and close time per user for a given
    day

    Which I think would mean all open windows should be cut off at midnight and all orphaned CLOSES on the following date should be assumed to have opened at midnight.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search