skip to Main Content

I have a lot of log data that has some duplicate values except for two columns. The simplified query looks like this:

AzureActivity
| project TimeGenerated, CorrelationId, Caller, Action, RoleId, ObjectId

This gives me a table that looks like this:

TimeGenerated CorrelationId Caller Action RoleId ObjectId Status
8/13/2024, 4:09:34.099 PM 1 John Doe Delete 222 333 Success
8/13/2024, 4:09:35.099 PM 1 John Doe Delete Start
8/15/2024, 8:09:34.099 PM 2 John Doe Write 444 555 Success
8/15/2024, 8:09:35.099 PM 2 John Doe Write Start
8/19/2024, 1:09:34.099 PM 3 John Doe Write 666 777 Success
8/19/2024, 1:09:36.099 PM 3 John Doe Write Start

I would like the table to look like this:

TimeGenerated CorrelationId Caller Action RoleId ObjectId Status
8/13/2024, 4:09:34.099 PM 1 John Doe Delete 222 333 Success
8/15/2024, 8:09:34.099 PM 2 John Doe Write 444 555 Success
8/19/2024, 1:09:34.099 PM 3 John Doe Write 666 777 Success

Anyone has any tips on how to get this to work?

I have tried to summarize by the correlation ID, but I will either still have duplicate rows or missing rows. I have tried to look for other solutions, but can’t seem to find anything that works for this specific case.

2

Answers


  1. you could try using the take_any() aggregation function – which could work for the data set you’ve provided (it may not work for you actual data set, but since you didn’t provide that, it’s not possible to determine that based on information provided in your question)

    datatable(TimeGenerated:datetime, CorrelationId:long, Caller:string, Action:string, RoleId:long, ObjectId:long, Status:string)
    [
        datetime(8/13/2024, 4:09:34.099 PM), 1, 'John Doe', 'Delete', 222,  333, 'Success',
        datetime(8/13/2024, 4:09:35.099 PM), 1, 'John Doe', 'Delete', long(null), long(null), 'Start',
        datetime(8/15/2024, 8:09:34.099 PM), 2, 'John Doe', 'Write',  444,  555, 'Success',
        datetime(8/15/2024, 8:09:35.099 PM), 2, 'John Doe', 'Write',long(null), long(null), 'Start',
        datetime(8/19/2024, 1:09:34.099 PM), 3, 'John Doe', 'Write',  666,  777, 'Success',
        datetime(8/19/2024, 1:09:36.099 PM), 3, 'John Doe', 'Write',long(null), long(null), 'Start',
    ]
    | summarize take_any(*) by CorrelationId
    
    CorrelationId TimeGenerated Caller Action RoleId ObjectId Status
    1 2024-08-13 16:09:34.0990000 John Doe Delete 222 333 Success
    2 2024-08-15 20:09:34.0990000 John Doe Write 444 555 Success
    3 2024-08-19 13:09:34.0990000 John Doe Write 666 777 Success
    Login or Signup to reply.
  2. Another (similar) solution would be to control which elements take_any() will choose with prev() and order by, like that:

    datatable(TimeGenerated:datetime, CorrelationId:long, Caller:string, Action:string, RoleId:long, ObjectId:long, Status:string)
    [
        datetime(8/13/2024, 4:09:35.099 PM), 1, 'John Doe', 'Delete', 111, long(null), 'Start',
        datetime(8/19/2024, 1:09:36.099 PM), 3, 'John Doe', 'Write', long(null), long(null), 'Start',
        datetime(8/15/2024, 8:09:35.099 PM), 2, 'John Doe', 'Write', long(null), long(null), 'Start',
        datetime(8/13/2024, 4:09:34.099 PM), 1, 'John Doe', 'Delete', 222,  333, 'Success',
        datetime(8/15/2024, 8:09:34.099 PM), 2, 'John Doe', 'Write',  444,  555, 'Success',
        datetime(8/19/2024, 1:09:34.099 PM), 3, 'John Doe', 'Write',  666,  777, 'Success',
    ]
    | order by CorrelationId, RoleId, ObjectId
    | where prev(CorrelationId) != CorrelationId
    | summarize take_any(*) by CorrelationId
    

    This Query will only choose non-empty fields (if existing) of RoleId and ObjectId. Without that take_any() would just choose random matching CorrelationIds.

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search