skip to Main Content

I have data arriving as separate events in JSON form resembling:

{
"id":1234,
"data":{
    "packet1":{"name":"packet1", "value":1},
    "packet2":{"name":"packet2", "value":2}
     }
}

I’d like to unnest the data to essentially have one row per ‘packet’ (there may be any number of packets).

id name value
1234 packet1 1
1234 packet2 2

I’ve looked at using the unnest function with the various JSON functions but it seems limited to working with arrays. I have not been able to find a way to treat the ‘data’ field as if it were an array.

At the moment, I cannot change these events to store packets in an array, and ideally the unnesting should be happening within BigQuery.

2

Answers


  1. Chosen as BEST ANSWER

    @Jaytiger's suggestion of unnesting a regex extract led me to the following solution. The example I showed was simplified - there are more fields within the packets. To avoid requiring separate regex for each field name, I used regex to split/extract each individual packet, and then read the JSON.

    This iteration doesn't do everything in one step but works when just looking at packets.

    with sample_data
    AS (SELECT """{"packet1":{"name":"packet1", "value":1},
                   "packet2":{"name":"packet2", "value":2}}""" as packets)
    
    select
        json_value('{'||packet||'}', "$.name") name,
        json_value('{'||packet||'}', "$.value") value
    from sample_data,
    unnest(regexp_extract_all(packets, r':{(.*?)}')) packet
    

  2. 1. Regular expressions

    There might be other ways but you can consider below approach using regular expressions.

    WITH sample_table AS (
      SELECT """{
        "id":1234,
        "data":{
          "packet1":{"name":"packet1", "value":1},
          "packet2":{"name":"packet2", "value":2}
         }
      }""" AS events
    )
    SELECT JSON_VALUE(events, '$.id') AS id, name, value
      FROM sample_table,
           UNNEST (REGEXP_EXTRACT_ALL(events, r'"name":"(w+)"')) name WITH offset
      JOIN UNNEST (REGEXP_EXTRACT_ALL(events, r'"value":([0-9.]+)')) value WITH offset
     USING (offset);
    

    Query results

    enter image description here

    2. Javascript UDF

    or, you might consider below using Javascript UDF.

    CREATE TEMP FUNCTION extract_pair(json STRING)
    RETURNS ARRAY<STRUCT<name STRING, value STRING>>
    LANGUAGE js AS """
      result = [];
      for (const [key, value] of Object.entries(JSON.parse(json))) {
        result.push(value);
      }
      return result;
    """;
    
    WITH sample_table AS (
      SELECT """{
        "id":1234,
        "data":{
          "packet1":{"name":"packet1", "value":1},
          "packet2":{"name":"packet2", "value":2}
         }
      }""" AS events
    )
    SELECT JSON_VALUE(events, '$.id') AS id, obj.*
      FROM sample_table, UNNEST(extract_pair(JSON_QUERY(events, '$.data'))) obj;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search