skip to Main Content

I hope somebody can help me with a tricky Kusto query.

I have the following dataset, which contains events raised by user, where events are represented with event code:

datatable(CreatedDate:datetime, User:string, EventCode:string)
[
    datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
    datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
    datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
    datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
    datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
    datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
    datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
    datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
    datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
    datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
    datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
    datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
    datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
    datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
    datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
    datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
]
| order by User asc, CreatedDate asc

I need to pivot this data to have a single day per user represented as a single line, which shows all of the complete events with their start and end date on a single line.

Event codes:

  • TD – Event Start code
  • TC or SP – Event End code
  • TS and rest of the codes – Can be ignored

There can be intermediary events between TD and (TC or SP) (which don’t affect anything), or there could be events sequences like TD, TD, TC, in which case only the last sequence should be taken into account.

And event in the output table should be represented by the event end code (TC or SP)

So the expected output for the data above would be this:

Created Day User Total Elapsed Event 1 Event 1 – From Time Event 1 – To Time Event 1 – Elapsed Event 2 Event 2 – From Time Event 2 – To Time Event 2 – Elapsed
2024-04-23 user1 00:02:31 SP 4:09:30.551 AM 4:09:59.067 AM 00:00:28 TC 7:10:02.052 AM 7:12:05.357 AM 00:02:03
2024-04-25 user1 00:01:54 TC 5:11:02.649 AM 5:12:56.672 AM 00:01:54
2024-04-25 user2 00:02:13 TC 4:36:33.656 AM 4:38:46.922 AM 00:02:13
2024-04-23 user3 00:01:14 TC 4:13:09.379 AM 4:14:23.724 AM 00:01:14
2024-04-25 user3 00:06:48 TC 4:34:18.966 AM 4:41:07.381 AM 00:06:48

There can be many more events raised per day for the user (up to 10-15 events maybe), which would stretch it horizontally.
Also please don’t mind date formatting, it can be anything, I just used the first available one.

2

Answers


  1. Chosen as BEST ANSWER

    Finally figured it out.

    Here is the query:

    let groupedEvents = datatable(CreatedDate:datetime, User:string, EventCode:string)
    [
        datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
        datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
        datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
        datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
        datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
        datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
        datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
        datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
        datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
        datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
        datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
        datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
        datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
        datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
        datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
        datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
    ]
    | order by User asc, CreatedDate asc
    | where EventCode in ('TD', 'TC', 'SP') 
    | extend EventStatus = iff(EventCode == 'TD', 'start', 'end')
    | sort by CreatedDate asc, User desc
    | extend NextUser = next(User), NextCreatedDate = next(CreatedDate), NextEventStatus = next(EventStatus), NextEventCode = next(EventCode)
    | extend CreatedDateOnly = bin(CreatedDate, 1d), NextCreatedDateOnly = bin(NextCreatedDate, 1d)
    | where User == NextUser and EventStatus == 'start' and NextEventStatus == 'end' and CreatedDateOnly == NextCreatedDateOnly
    | project CreatedDate = format_datetime(CreatedDateOnly, "yyyy-MM-dd"), User, Event = NextEventCode, From = format_datetime(CreatedDate, "HH:mm:ss"), To = format_datetime(NextCreatedDate, "HH:mm:ss"), Elapsed = bin(NextCreatedDate - CreatedDate, 1s );
    groupedEvents
    | summarize TotalElapsed = sum(Elapsed), EventsCount = count() by CreatedDate, User
    | join kind=inner  (
        groupedEvents
        | sort by CreatedDate asc, User asc
        | extend PrevCreatedDate = prev(CreatedDate), PrevUser = prev(User)
        | project CreatedDate, User, ColumnValue = strcat_delim('`',Event,From,To,Elapsed), EventNo = row_number(1, CreatedDate != PrevCreatedDate or User != PrevUser)
        | mv-apply with_itemindex = i split(ColumnValue, '`') on (
            extend i = i + 1 
        )
        | extend ColumnName = strcat("Event ", EventNo, case(i == 1, "", i == 2, " - From", i == 3, " - To", i == 4, " - Elapsed", "")) 
        | project-away EventNo, i
        | evaluate pivot (ColumnName, take_any(ColumnValue))
    ) on CreatedDate, User
    | project-away CreatedDate1, User1
    | sort by User asc, CreatedDate asc
    

    Here is the output:

    CreatedDate User TotalElapsed EventsCount Event 1 Event 1 - Elapsed Event 1 - From Event 1 - To Event 2 Event 2 - Elapsed Event 2 - From Event 2 - To
    2024-04-23 user1 00:02:31 2 SP 00:00:28 04:09:30 04:09:59 TC 00:02:03 07:10:02 07:12:05
    2024-04-25 user1 00:01:54 1 TC 00:01:54 05:11:02 05:12:56
    2024-04-25 user2 00:02:13 1 TC 00:02:13 04:36:33 04:38:46
    2024-04-23 user3 00:01:14 1 TC 00:01:14 04:13:09 04:14:23

    Hope it helps someone in the future as well


  2. The following KQL comes close to what you want to achieve, might be other solutions that solve your problem in a better way.
    Several steps have to be done:

    1. Identifying an event. For this KQL offers several options. Often used are the row_window_session or the scan-operator.
    2. putting the event-information in a property bag with the bag_pack function
    3. pivot the events with the pivot-plugin

    This would look like as follows:

        datatable(CreatedDate:datetime, User:string, EventCode:string)
    [
        datetime(4-22-2024 12:44:02.750 PM), "user1", "TS",
        datetime(4-23-2024 4:09:30.551 AM), "user1", "TD",
        datetime(4-23-2024 4:09:59.067 AM), "user1", "SP",
        datetime(4-23-2024 7:10:02.052 AM), "user1", "TD",
        datetime(4-23-2024 7:12:05.357 AM), "user1", "TC",
        datetime(4-25-2024 5:11:02.649 AM), "user1", "TD",
        datetime(4-25-2024 5:12:56.672 AM), "user1", "TC",
        datetime(4-23-2024 9:53:12.315 AM), "user2", "TS",
        datetime(4-25-2024 4:36:33.656 AM), "user2", "TD",
        datetime(4-25-2024 4:38:46.922 AM), "user2", "TC",
        datetime(4-22-2024 12:40:35.801 PM), "user3", "TS",
        datetime(4-23-2024 4:13:09.379 AM), "user3", "TD",
        datetime(4-23-2024 4:13:23.724 AM), "user3", "TS",
        datetime(4-23-2024 4:14:23.724 AM), "user3", "TC",
        datetime(4-25-2024 4:34:18.966 AM), "user3", "TD",
        datetime(4-25-2024 4:41:07.381 AM), "user3", "TC",
    ]
    // only some codes are of interest
    | where  EventCode in ("TD", "TC", "SP")
    // TD indicates the start of an event
    | extend Event = iff(EventCode =="TD", "start", "end")
    | sort by User asc, CreatedDate asc
    // 1. calculate the sessions, alternatively take a look at the scan operator
    | extend EventStarted = row_window_session(CreatedDate, 1h, 5m, User != prev(User))
    | extend  day=bin(EventStarted, 1d)
    | summarize EventEnded=max(CreatedDate) by day, User,  EventStarted
    | sort by day, User asc , EventEnded asc 
    // count events per user and day
    | extend event=row_number(1, prev(User) != User)
    | extend Duration= EventEnded-EventStarted, event=strcat("Event", tostring(event))
    // 2. create a dynamic property bag for the events
    | extend bag=bag_pack("Event", event, "From", tostring(EventStarted), "To", tostring(EventEnded), "Duration", tostring(Duration)) 
    | summarize take_any(bag) by User, day, event
    // 3. pivot the events
    | evaluate pivot(event, take_any(bag))
    

    With this you will get colums Event1, Event2, … with the event-content in the property bags.

    enter image description here

    You could transform the content into columns with the bag_unpack function, but this would only work if you know the number of events:

    | evaluate bag_unpack(Event1, "Event1") 
    | evaluate bag_unpack(Event2, "Event2")
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search