I have a dataset that tracks user activity, and I need to identify consecutive rows where a specific condition is met. The table structure and sample data are as follows:
Table: UserActivity
UserID | Activity | Timestamp |
---|---|---|
1 | Login | 2023-11-01 08:00:00 |
1 | PageView | 2023-11-01 08:10:00 |
1 | Login | 2023-11-01 08:15:00 |
1 | Logout | 2023-11-01 08:20:00 |
2 | Login | 2023-11-01 09:00:00 |
2 | Login | 2023-11-01 09:05:00 |
2 | PageView | 2023-11-01 09:10:00 |
I want to identify cases where consecutive rows for the same user have the same Activity, along with their timestamps. For example:
For UserID = 2, there are two consecutive Login activities. Like this.
Expected Output
UserID | Activity | StartTimestamp | EndTimestamp |
---|---|---|---|
2 | Login | 2023-11-01 09:00:00 | 2023-11-01 09:05:00 |
I have attempted using a self-join to compare each row with the previous one, but it becomes computationally expensive for large datasets. I’ve also tried using LAG
and LEAD
functions, but I’m struggling to group consecutive rows efficiently.
2
Answers
Here is a possible solution using LAG. This solution assumes no more then 2 duplicates per UserID+Activity in a row.
fiddle
Alternate solution adapting strategy from stackexchange user tombom
https://dba.stackexchange.com/questions/188775/selecting-consecutive-rows-with-same-values
This solution handles 2+ duplicates per UserID+Activity in a row.
fiddle