skip to Main Content

I have a dataset that tracks user activity, and I need to identify consecutive rows where a specific condition is met. The table structure and sample data are as follows:

Table: UserActivity

UserID Activity Timestamp
1 Login 2023-11-01 08:00:00
1 PageView 2023-11-01 08:10:00
1 Login 2023-11-01 08:15:00
1 Logout 2023-11-01 08:20:00
2 Login 2023-11-01 09:00:00
2 Login 2023-11-01 09:05:00
2 PageView 2023-11-01 09:10:00

I want to identify cases where consecutive rows for the same user have the same Activity, along with their timestamps. For example:

For UserID = 2, there are two consecutive Login activities. Like this.

Expected Output

UserID Activity StartTimestamp EndTimestamp
2 Login 2023-11-01 09:00:00 2023-11-01 09:05:00

I have attempted using a self-join to compare each row with the previous one, but it becomes computationally expensive for large datasets. I’ve also tried using LAG and LEAD functions, but I’m struggling to group consecutive rows efficiently.

2

Answers


  1. Here is a possible solution using LAG. This solution assumes no more then 2 duplicates per UserID+Activity in a row.

    WITH CTE AS
    (
    SELECT UserID, Activity, 
            LAG(USERID) OVER (ORDER BY TimeStamp)  as PrevID, 
            LAG(Activity) OVER (ORDER BY TimeStamp)  as PrevActivity, 
            LAG(TimeStamp) OVER (ORDER BY TimeStamp)  as StartTimeStamp, 
            TimeStamp as EndTimeStamp
    FROM Example
    ORDER BY TimeStamp
    )
    SELECT UserID, Activity, StartTimeStamp, EndTimeStamp
    FROM CTE
    WHERE UserID=PrevID AND Activity=PrevActivity
    ORDER BY StartTimeStamp
    

    fiddle

    UserID Activity StartTimeStamp EndTimeStamp
    2 Login 2023-11-01 09:00:00 2023-11-01 09:05:00
    Login or Signup to reply.
  2. Alternate solution adapting strategy from stackexchange user tombom
    https://dba.stackexchange.com/questions/188775/selecting-consecutive-rows-with-same-values

    This solution handles 2+ duplicates per UserID+Activity in a row.

    SELECT userID, 
           Activity, 
           MIN(Timestamp) as starttime,
           MAX(Timestamp) as endtime
    FROM (
        SELECT
        t.*
        , @groupNumber := IF(@prev_userID != userID OR @prev_Activity != Activity, @groupNumber + 1, @groupNumber) AS gn
        , @prev_userID := userID
        , @prev_Activity := Activity
        FROM Example t
        , (SELECT @groupNumber := 0, @prev_userID := NULL, @prev_Activity := NULL) var_init_subquery
        ORDER BY userID, Timestamp
    ) sq
    GROUP BY gn, userID, Activity
    HAVING Count(*) > 1 
    

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search