I have a table called activity
that contains values like the following:
userId | timestamp | action
----------------------------------------
1 | 2022-10-18 10:00:00 | OPEN
2 | 2022-10-18 10:20:00 | OPEN
1 | 2022-10-18 10:05:00 | CLOSE
2 | 2022-10-18 10:22:00 | CLOSE
...
I want to be able to get the difference between the open and close time per user for a given day like so:
desiredTable
------------------------------------------------------
userId | start_time | time_elapsed_in_minutes
------------------------------------------------------
1 | 2022-10-18 10:00:00 | 5
2 | 2022-10-18 10:20:00 | 2
A couple of things to note are:
- There is no guarantee
OPEN
andCLOSE
rows will be back to back to each other as the table also holds a lot of other action types. - Also, there is no guarantee that there will be the same number of
OPEN
andCLOSE
rows due to network conditions that could result in either not being reported. I.e: user 1 can have 3 opens and 1 close, so only 1 pair needs to be calculated.
My approach:
- create table of OPEN and CLOSE counts per user
userId | # opens | # closes
- determine which number is less (# opens or # closes) and get that # of rows from the activity table
- opens table with the # of rows determined above
userId | timestamp
- closes table with the # of rows determined above
userId | timestamp
- subtract close timestamp from open timestamp group by userId and truncate to minutes
Any help would be greatly appreciated!
3
Answers
We group every
action = 'close'
with theaction = 'open'
that comes before it. Then we group by id and choose the timestamps.Fiddle
We can use
lead()
when nextaction
isclosed
.Fiddle
As suggested by @Robert Hamilton this can be directly translated into a query using window functions:
Fiddle.
But the same thing can be achieved using very basic constructs – I used CTEs below only for readability:
Fiddle.
I’d normally expect all following
OPEN
s in a sequence to be retransmissions of the first one, similar to what you assume aboutCLOSE
s, which is why I also addedlongest_open_window_before_a_close
– pairing up the earliest recordedOPEN
before aCLOSE
. By default, the code usesshortest_open_window_before_a_close
of your choice.One unaddressed detail I spotted is
Which I think would mean all open windows should be cut off at midnight and all orphaned
CLOSES
on the following date should be assumed to have opened at midnight.