for several days I’m trying to ingest Apache Avro formatted data from a blob storage into the Azure Data Explorer.
I’m able to reference the toplevel JSON-keys like $.Body
(see red underlined example in the screenshot below), but when it goes to the nested JSON-keys, Azure fails to parse them properly and displays nothing (as seen in the green column: I would expect $.Body.entityId
to reference the key "entityId" inside the Body-JSON).
Many thanks in advance for any help!
Here is a screenshot of the azure data explorer web interface
Edit 1
I already tried to increase the "Nested Levels" Option to 2, but all I got is this error message with no further details. The error message won’t even disappear when I decrease the Level back to 1. I have to cancel and start the process all over agein.
I just recognize that the auto-generated columns have some strange types. Seems like they add up to the type string
… This seems a little odd to me either.
Edit 2
Here is some kql-Code.
This is the schema of my input .avro
file, what I get from my Eventhub-Capture:
{
SequenceNumber: ...,
Offset: ...,
EnqueuedTimeUTC: ...,
SystemProperties: ...,
Properties: ...,
Body: {
entityId: ...,
eventTime: ...,
messageId: ...,
data: ...
}
}, ...
And with these ingestion commands I can’t reference the inner JSON-keys. The toplevel keys work perfectly fine.
// Create table command
////////////////////////////////////////////////////////////
.create table ['test_table'] (['Body']:dynamic, ['entityId']:string)
// Create mapping command
////////////////////////////////////////////////////////////
.create table ['test_table'] ingestion apacheavro mapping 'test_table_mapping' '[{"column":"Body", "Properties":{"Path":"$.Body"}},{"column":"entityId", "Properties":{"Path":"$.Body.entityId"}}]'
// Ingest data into table command
///////////////////////////////////////////////////////////
.ingest async into table ['test_table'] (h'[SAS URL]') with (format='apacheavro',ingestionMappingReference='test_table_mapping',ingestionMappingType='apacheavro',tags="['503a2cfb-5b81-4c07-8658-639009870862']")
I would love to ingest the inner data fields on separate columns, instead of building any workaround with update policies.
3
Answers
For those having the same issue, here is the workaround we currently use:
First, assume that we want to ingest the contents of the Body field from the avro file to the table
avro_destination
.Step 1: Create an ingestion table
Step 2: Create an update policy
Step 3: Ingest the .avro files into the
avro_ingest
table...as seen in the Question, with one Column containing the whole Body-JSON per entry.
Simply increase "Nested levels" to 2.
Following the OP updates
Here is the Avro schema of an Event Hubs capture.
As you can see,
Body
as of typebytes
, so there is practically nothing you can do with it at this form, other than ingesting it As Is (as Dynamic).If you’ll take a look on the ingested data, you’ll see that the content of
Body
is arrays of integers.Those integers are the decimal values of the characters that construct
Body
.Body
can be converted to text using make_string() and then parsed to JSON using todynamic()