Azure - KQL merge rows with the same ID into one row

LucaDessauvagie
August 14, 2024
171 views
0 votes
2 Answers

I have a lot of log data that has some duplicate values except for two columns. The simplified query looks like this:

AzureActivity
| project TimeGenerated, CorrelationId, Caller, Action, RoleId, ObjectId

This gives me a table that looks like this:

TimeGenerated	CorrelationId	Caller	Action	RoleId	ObjectId	Status
8/13/2024, 4:09:34.099 PM	1	John Doe	Delete	222	333	Success
8/13/2024, 4:09:35.099 PM	1	John Doe	Delete			Start
8/15/2024, 8:09:34.099 PM	2	John Doe	Write	444	555	Success
8/15/2024, 8:09:35.099 PM	2	John Doe	Write			Start
8/19/2024, 1:09:34.099 PM	3	John Doe	Write	666	777	Success
8/19/2024, 1:09:36.099 PM	3	John Doe	Write			Start

I would like the table to look like this:

TimeGenerated	CorrelationId	Caller	Action	RoleId	ObjectId	Status
8/13/2024, 4:09:34.099 PM	1	John Doe	Delete	222	333	Success
8/15/2024, 8:09:34.099 PM	2	John Doe	Write	444	555	Success
8/19/2024, 1:09:34.099 PM	3	John Doe	Write	666	777	Success

Anyone has any tips on how to get this to work?

I have tried to summarize by the correlation ID, but I will either still have duplicate rows or missing rows. I have tried to look for other solutions, but can’t seem to find anything that works for this specific case.

Answers

you could try using the take_any() aggregation function – which could work for the data set you’ve provided (it may not work for you actual data set, but since you didn’t provide that, it’s not possible to determine that based on information provided in your question)

datatable(TimeGenerated:datetime, CorrelationId:long, Caller:string, Action:string, RoleId:long, ObjectId:long, Status:string)
[
    datetime(8/13/2024, 4:09:34.099 PM), 1, 'John Doe', 'Delete', 222,  333, 'Success',
    datetime(8/13/2024, 4:09:35.099 PM), 1, 'John Doe', 'Delete', long(null), long(null), 'Start',
    datetime(8/15/2024, 8:09:34.099 PM), 2, 'John Doe', 'Write',  444,  555, 'Success',
    datetime(8/15/2024, 8:09:35.099 PM), 2, 'John Doe', 'Write',long(null), long(null), 'Start',
    datetime(8/19/2024, 1:09:34.099 PM), 3, 'John Doe', 'Write',  666,  777, 'Success',
    datetime(8/19/2024, 1:09:36.099 PM), 3, 'John Doe', 'Write',long(null), long(null), 'Start',
]
| summarize take_any(*) by CorrelationId

CorrelationId	TimeGenerated	Caller	Action	RoleId	ObjectId	Status
1	2024-08-13 16:09:34.0990000	John Doe	Delete	222	333	Success
2	2024-08-15 20:09:34.0990000	John Doe	Write	444	555	Success
3	2024-08-19 13:09:34.0990000	John Doe	Write	666	777	Success

Another (similar) solution would be to control which elements take_any() will choose with prev() and order by, like that:

datatable(TimeGenerated:datetime, CorrelationId:long, Caller:string, Action:string, RoleId:long, ObjectId:long, Status:string)
[
    datetime(8/13/2024, 4:09:35.099 PM), 1, 'John Doe', 'Delete', 111, long(null), 'Start',
    datetime(8/19/2024, 1:09:36.099 PM), 3, 'John Doe', 'Write', long(null), long(null), 'Start',
    datetime(8/15/2024, 8:09:35.099 PM), 2, 'John Doe', 'Write', long(null), long(null), 'Start',
    datetime(8/13/2024, 4:09:34.099 PM), 1, 'John Doe', 'Delete', 222,  333, 'Success',
    datetime(8/15/2024, 8:09:34.099 PM), 2, 'John Doe', 'Write',  444,  555, 'Success',
    datetime(8/19/2024, 1:09:34.099 PM), 3, 'John Doe', 'Write',  666,  777, 'Success',
]
| order by CorrelationId, RoleId, ObjectId
| where prev(CorrelationId) != CorrelationId
| summarize take_any(*) by CorrelationId

This Query will only choose non-empty fields (if existing) of RoleId and ObjectId. Without that take_any() would just choose random matching CorrelationIds.

Please signup or login to give your own answer.

Click here to cancel reply.

Azure – KQL merge rows with the same ID into one row

Answers