skip to Main Content

for several days I’m trying to ingest Apache Avro formatted data from a blob storage into the Azure Data Explorer.

I’m able to reference the toplevel JSON-keys like $.Body (see red underlined example in the screenshot below), but when it goes to the nested JSON-keys, Azure fails to parse them properly and displays nothing (as seen in the green column: I would expect $.Body.entityId to reference the key "entityId" inside the Body-JSON).

Many thanks in advance for any help!

Here is a screenshot of the azure data explorer web interface

Edit 1

I already tried to increase the "Nested Levels" Option to 2, but all I got is this error message with no further details. The error message won’t even disappear when I decrease the Level back to 1. I have to cancel and start the process all over agein.

I just recognize that the auto-generated columns have some strange types. Seems like they add up to the type string… This seems a little odd to me either.

Edit 2

Here is some kql-Code.

This is the schema of my input .avro file, what I get from my Eventhub-Capture:

{
  SequenceNumber: ...,
  Offset: ...,
  EnqueuedTimeUTC: ...,
  SystemProperties: ...,
  Properties: ...,
  Body: {
    entityId: ...,
    eventTime: ...,
    messageId: ...,
    data: ...
  }
}, ...

And with these ingestion commands I can’t reference the inner JSON-keys. The toplevel keys work perfectly fine.

// Create table command
////////////////////////////////////////////////////////////
.create table ['test_table']  (['Body']:dynamic, ['entityId']:string)

// Create mapping command
////////////////////////////////////////////////////////////
.create table ['test_table'] ingestion apacheavro mapping 'test_table_mapping' '[{"column":"Body", "Properties":{"Path":"$.Body"}},{"column":"entityId", "Properties":{"Path":"$.Body.entityId"}}]'

// Ingest data into table command
///////////////////////////////////////////////////////////
.ingest async into table ['test_table'] (h'[SAS URL]') with (format='apacheavro',ingestionMappingReference='test_table_mapping',ingestionMappingType='apacheavro',tags="['503a2cfb-5b81-4c07-8658-639009870862']")

I would love to ingest the inner data fields on separate columns, instead of building any workaround with update policies.

3

Answers


  1. Chosen as BEST ANSWER

    For those having the same issue, here is the workaround we currently use:

    First, assume that we want to ingest the contents of the Body field from the avro file to the table avro_destination.

    Step 1: Create an ingestion table

    .create table avro_ingest(
        Body: dynamic
        // optional other columns, if you want...
    )
    

    Step 2: Create an update policy

    .create-or-alter function
        with (docstring = 'Convert avro_ingest to avro_destination', folder='ingest')
        convert_avro_ingest() {
            avro_ingest
            | extend entityId = tostring(Body.entityId)
            | extend messageId = tostring(Body.messageId)
            | extend eventTime = todatetime(Body.eventTime)
            | extend data = Body.data
            | project entityId, messageId, eventTime, data
        }
    
    .alter table avro_destination policy update
    @'[{ "IsEnabled": true, "Source": "avro_ingest", "Query": "convert_avro_ingest()", "IsTransactional": false, "PropagateIngestionProperties": true}]'
    
    

    Step 3: Ingest the .avro files into the avro_ingest table

    ...as seen in the Question, with one Column containing the whole Body-JSON per entry.


  2. Simply increase "Nested levels" to 2.

    Data ingestion

    Login or Signup to reply.
  3. Following the OP updates

    Here is the Avro schema of an Event Hubs capture.
    As you can see, Body as of type bytes, so there is practically nothing you can do with it at this form, other than ingesting it As Is (as Dynamic).

    {
    
        "type":"record",
        "name":"EventData",
        "namespace":"Microsoft.ServiceBus.Messaging",
        "fields":[
                     {"name":"SequenceNumber","type":"long"},
                     {"name":"Offset","type":"string"},
                     {"name":"EnqueuedTimeUtc","type":"string"},
                     {"name":"SystemProperties","type":{"type":"map","values":["long","double","string","bytes"]}},
                     {"name":"Properties","type":{"type":"map","values":["long","double","string","bytes"]}},
                     {"name":"Body","type":["null","bytes"]}
                 ]
    }
    

    If you’ll take a look on the ingested data, you’ll see that the content of Body is arrays of integers.
    Those integers are the decimal values of the characters that construct Body.

    capture
    | project Body
    | take 3
    
    Body
    [123,34,105,100,34,58,32,34,56,49,55,98,50,99,100,57,45,97,98,48,49,45,52,100,51,53,45,57,48,51,54,45,100,57,55,50,51,55,55,98,54,56,50,57,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,48,44,32,34,109,121,105,110,116,34,58,32,50,48,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,51,57,56,53,52,52,56,55,52,53,57,56,57,48,55,57,55,125]
    [123,34,105,100,34,58,32,34,57,53,100,52,100,55,56,48,45,97,99,100,55,45,52,52,57,50,45,98,97,54,100,45,52,56,49,54,97,51,56,100,52,56,56,51,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,49,44,32,34,109,121,105,110,116,34,58,32,56,56,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,54,53,53,51,55,51,51,56,49,57,54,53,50,52,52,49,125]
    [123,34,105,100,34,58,32,34,53,50,100,49,102,54,54,53,45,102,57,102,54,45,52,49,50,49,45,97,50,57,99,45,55,55,56,48,102,101,57,53,53,55,48,56,34,44,32,34,100,116,34,58,32,34,50,48,50,49,45,48,56,45,49,50,84,49,54,58,52,56,58,51,50,46,53,57,54,50,53,52,34,44,32,34,105,34,58,32,50,44,32,34,109,121,105,110,116,34,58,32,49,57,44,32,34,109,121,102,108,111,97,116,34,58,32,48,46,52,53,57,54,49,56,54,51,49,51,49,50,50,52,50,50,51,125]

    Body can be converted to text using make_string() and then parsed to JSON using todynamic()

    capture
    | project BodyJSON = todynamic(make_string(Body))
    | take 3
    
    BodyJSON
    {"id":"817b2cd9-ab01-4d35-9036-d972377b6829","dt":"2021-08-12T16:48:32.5962540Z","i":0,"myint":20,"myfloat":"0.398544874598908"}
    {"id":"95d4d780-acd7-4492-ba6d-4816a38d4883","dt":"2021-08-12T16:48:32.5962540Z","i":1,"myint":88,"myfloat":"0.65537338196524408"}
    {"id":"52d1f665-f9f6-4121-a29c-7780fe955708","dt":"2021-08-12T16:48:32.5962540Z","i":2,"myint":19,"myfloat":"0.45961863131224223"}
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search