I have a lot of log data that has some duplicate values except for two columns. The simplified query looks like this:
AzureActivity
| project TimeGenerated, CorrelationId, Caller, Action, RoleId, ObjectId
This gives me a table that looks like this:
TimeGenerated | CorrelationId | Caller | Action | RoleId | ObjectId | Status |
---|---|---|---|---|---|---|
8/13/2024, 4:09:34.099 PM | 1 | John Doe | Delete | 222 | 333 | Success |
8/13/2024, 4:09:35.099 PM | 1 | John Doe | Delete | Start | ||
8/15/2024, 8:09:34.099 PM | 2 | John Doe | Write | 444 | 555 | Success |
8/15/2024, 8:09:35.099 PM | 2 | John Doe | Write | Start | ||
8/19/2024, 1:09:34.099 PM | 3 | John Doe | Write | 666 | 777 | Success |
8/19/2024, 1:09:36.099 PM | 3 | John Doe | Write | Start |
I would like the table to look like this:
TimeGenerated | CorrelationId | Caller | Action | RoleId | ObjectId | Status |
---|---|---|---|---|---|---|
8/13/2024, 4:09:34.099 PM | 1 | John Doe | Delete | 222 | 333 | Success |
8/15/2024, 8:09:34.099 PM | 2 | John Doe | Write | 444 | 555 | Success |
8/19/2024, 1:09:34.099 PM | 3 | John Doe | Write | 666 | 777 | Success |
Anyone has any tips on how to get this to work?
I have tried to summarize by the correlation ID, but I will either still have duplicate rows or missing rows. I have tried to look for other solutions, but can’t seem to find anything that works for this specific case.
2
Answers
you could try using the
take_any()
aggregation function – which could work for the data set you’ve provided (it may not work for you actual data set, but since you didn’t provide that, it’s not possible to determine that based on information provided in your question)Another (similar) solution would be to control which elements
take_any()
will choose withprev()
andorder by
, like that:This Query will only choose non-empty fields (if existing) of
RoleId
andObjectId
. Without thattake_any()
would just choose random matchingCorrelationIds
.