skip to Main Content

I have db on MySQL – dataset with users actions logs. We have next data collection logic – every user have it own user_id and we record log of event actions during user session.

user_id session_id dateTime event
1 aa 2023-01-01 13:12:11 login
1 aa 2023-01-01 14:12:10 buy
1 bb 2023-01-02 11:12:10 page
2 cc 2023-01-01 10:11:01 login
2 gg 2023-01-03 11:12:11 logout
2 gg 2023-01-03 13:11:03 click
2 gg 2023-01-03 14:10:07 logout

The main goal is prepare mysql querry with raw data and add 2 new calculated columns: 1. with events action sequence during every user session and 2. with session sequence during all user lifetime by SQL.

Expected output:

user_id session_id dateTime event event_seq session_seq
1 aa 2023-01-01 13:12:11 login 1 1
1 aa 2023-01-01 14:12:10 buy 2 1
1 bb 2023-01-02 11:12:10 page 1 2
2 cc 2023-01-01 10:11:01 login 1 1
2 gg 2023-01-03 11:12:11 logout 1 2
2 gg 2023-01-03 13:11:03 click 2 2
2 gg 2023-01-03 14:10:07 logout 3 2

3

Answers


  1. you can use a query like this:

    SELECT ds.*,
    dense_rank() over (PARTITION BY user_id,session_id order BY session_id,`dateTime` ) AS event_seq,
    dense_rank() over (PARTITION BY user_id  order BY user_id,session_id   ) AS session_seq
    FROM dataset AS ds
    ORDER BY user_id,session_id,`dateTime`;
    

    sample

    mysql> SELECT * FROM dataset;
    +----+---------+------------+---------------------+--------+
    | id | user_id | session_id | dateTime            | event  |
    +----+---------+------------+---------------------+--------+
    |  1 |       1 | aa         | 2023-01-01 13:12:11 | login  |
    |  2 |       1 | aa         | 2023-01-01 14:12:10 | buy    |
    |  3 |       1 | bb         | 2023-01-02 11:12:10 | page   |
    |  4 |       2 | cc         | 2023-01-01 10:11:01 | login  |
    |  5 |       2 | gg         | 2023-01-03 11:12:11 | logout |
    |  6 |       2 | gg         | 2023-01-03 13:11:03 | click  |
    |  7 |       2 | gg         | 2023-01-03 14:10:07 | logout |
    +----+---------+------------+---------------------+--------+
    7 rows in set (0.00 sec)
    
    mysql> SELECT ds.*,
        -> dense_rank() over (PARTITION BY user_id,session_id order BY session_id,`dateTime` ) AS event_seq,
        -> dense_rank() over (PARTITION BY user_id  order BY user_id,session_id   ) AS session_seq
        -> FROM dataset AS ds
        -> ORDER BY user_id,session_id,`dateTime`;
    +----+---------+------------+---------------------+--------+-----------+-------------+
    | id | user_id | session_id | dateTime            | event  | event_seq | session_seq |
    +----+---------+------------+---------------------+--------+-----------+-------------+
    |  1 |       1 | aa         | 2023-01-01 13:12:11 | login  |         1 |           1 |
    |  2 |       1 | aa         | 2023-01-01 14:12:10 | buy    |         2 |           1 |
    |  3 |       1 | bb         | 2023-01-02 11:12:10 | page   |         1 |           2 |
    |  4 |       2 | cc         | 2023-01-01 10:11:01 | login  |         1 |           1 |
    |  5 |       2 | gg         | 2023-01-03 11:12:11 | logout |         1 |           2 |
    |  6 |       2 | gg         | 2023-01-03 13:11:03 | click  |         2 |           2 |
    |  7 |       2 | gg         | 2023-01-03 14:10:07 | logout |         3 |           2 |
    +----+---------+------------+---------------------+--------+-----------+-------------+
    7 rows in set (0.00 sec)
    
    mysql> 
    

    note: see if your MySQL Version supports WINDOW functions

    Login or Signup to reply.
  2. you can accomplish this using window functions.

    WITH session_views
         AS (SELECT user_id,
                    session_id,
                    Row_number()
                      OVER (
                        partition BY user_id) session_seq
             FROM   user_actions
             GROUP  BY user_id,
                       session_id)
    SELECT user_actions.user_id,
           user_actions.session_id,
           user_actions.dateTime,
           user_actions.event,
           Row_number()
             OVER (
               partition BY user_actions.session_id) event_seq,
           session_views.session_seq,
    FROM   user_actions
           LEFT JOIN session_views
                  ON session_views.session_id = user_actions.session_id and session_views.user_id = user_actions.user_id;
    
    Login or Signup to reply.
  3. DB fiddle

    SELECT logs.*,
    ROW_NUMBER() OVER (PARTITION BY user_id, session_id) as event_seq,
    DENSE_RANK() OVER (PARTITION BY user_id ORDER BY user_id, session_id) AS session_seq
    FROM logs;
    

    You can use window functions ROW_NUMBER() and DENSE_RANK():

    • ROW_NUMBER() enumerates rows inside of each partition which is grouped by user_id, session_id columns
    • DENSE_RANK() consequently ranks rows inside each partition which is grouped by user_id column. The order is determined in ORDER BY user_id, session_id expression.

    The documentation.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search