Normalising a nested json

reddwarfcrew
April 10, 2024
65 views
0 votes
2 Answers

I’m struggling to normalise nested attributes in a JSON using python data frames.

My JSON structure is:

{
   "aaData": [
     {
        "Name": "ABC",
        "NestedAttribute": [
         {
          "id": "123",
          "value": "abc2"
         }
       ]
     },
     {
        "Name": "DEF",
        "NestedAttribute": [
         {
          "id": "456",
          "value": "def2"
         }
       ]
     }
   ]
}

I’m reading in the JSON and converting to dataframe with the following:

after reading in the JSON and converting to a dataframe, I am extracting the records with:

df = pf.json_normlize(df['aaData'])

This gives me two columns, ‘Name’ and ‘NestedAttribute’. I’d like to now split the ‘NestedAttribute’ into the 2 columns (NestedAttribute_id and NestedAttribute_Value).

I’ve tried using normalise a second time, but it’s not working as I think it thinks it’s a nested array due to be enclosed by the []

I know I’m missing something obvious, so hoping someone can put me out of my misery and give me a point.

The output I want to get to is:

Name	NestedAttribute_id	NestedAttribute_value
ABC	123	abc2
DEF	456	def2

Thanks for any help.

Answers

If you look at the documentation for json_normalize, there are examples of how to do exactly what you want. The records you want are in the NestedAttribute key; additional metadata for each record is the Name key.

import pandas as pd

data = {
   "aaData": [
     {
        "Name": "ABC",
        "NestedAttribute": [
         {
          "id": "123",
          "value": "abc2"
         }
       ]
     },
     {
        "Name": "DEF",
        "NestedAttribute": [
         {
          "id": "456",
          "value": "def2"
         }
       ]
     }
   ]
}

pd.json_normalize(data['aaData'], 'NestedAttribute',
    'Name', record_prefix='NestedAttribute_')

This returns:

  NestedAttribute_id NestedAttribute_value Name
0                123                  abc2  ABC
1                456                  def2  DEF

You could convert the input JSON to a dictionary and then use a list comprehension to create a modified dictionary that is input to pandasDataFrame.from_dict:

import pandas as pd
import json

s = """{
   "aaData": [
     {
        "Name": "ABC",
        "NestedAttribute": [
         {
          "id": "123",
          "value": "abc2"
         }
       ]
     },
     {
        "Name": "DEF",
        "NestedAttribute": [
         {
          "id": "456",
          "value": "def2"
         }
       ]
     }
   ]
}"""

aaData = json.loads(s)['aaData']

d = [
    {
        'Name': d1['Name'],
        'NestedAttribute_id': d2['id'],
        'NestedAttribute_value': d2['value']
    }
    for d1 in aaData
    for d2 in d1['NestedAttribute']
]

print(pd.DataFrame.from_dict(d))

Prints:

  Name NestedAttribute_id NestedAttribute_value
0  ABC                123                  abc2
1  DEF                456                  def2

Please signup or login to give your own answer.

Click here to cancel reply.