skip to Main Content

I’m struggling to normalise nested attributes in a JSON using python data frames.

My JSON structure is:

{
   "aaData": [
     {
        "Name": "ABC",
        "NestedAttribute": [
         {
          "id": "123",
          "value": "abc2"
         }
       ]
     },
     {
        "Name": "DEF",
        "NestedAttribute": [
         {
          "id": "456",
          "value": "def2"
         }
       ]
     }
   ]
}

I’m reading in the JSON and converting to dataframe with the following:

after reading in the JSON and converting to a dataframe, I am extracting the records with:

df = pf.json_normlize(df['aaData'])

This gives me two columns, ‘Name’ and ‘NestedAttribute’. I’d like to now split the ‘NestedAttribute’ into the 2 columns (NestedAttribute_id and NestedAttribute_Value).

I’ve tried using normalise a second time, but it’s not working as I think it thinks it’s a nested array due to be enclosed by the []

I know I’m missing something obvious, so hoping someone can put me out of my misery and give me a point.

The output I want to get to is:

Name NestedAttribute_id NestedAttribute_value
ABC 123 abc2
DEF 456 def2

Thanks for any help.

2

Answers


  1. If you look at the documentation for json_normalize, there are examples of how to do exactly what you want. The records you want are in the NestedAttribute key; additional metadata for each record is the Name key.

    import pandas as pd
    
    data = {
       "aaData": [
         {
            "Name": "ABC",
            "NestedAttribute": [
             {
              "id": "123",
              "value": "abc2"
             }
           ]
         },
         {
            "Name": "DEF",
            "NestedAttribute": [
             {
              "id": "456",
              "value": "def2"
             }
           ]
         }
       ]
    }
    
    pd.json_normalize(data['aaData'], 'NestedAttribute',
        'Name', record_prefix='NestedAttribute_')
    

    This returns:

      NestedAttribute_id NestedAttribute_value Name
    0                123                  abc2  ABC
    1                456                  def2  DEF
    
    Login or Signup to reply.
  2. You could convert the input JSON to a dictionary and then use a list comprehension to create a modified dictionary that is input to pandasDataFrame.from_dict:

    import pandas as pd
    import json
    
    s = """{
       "aaData": [
         {
            "Name": "ABC",
            "NestedAttribute": [
             {
              "id": "123",
              "value": "abc2"
             }
           ]
         },
         {
            "Name": "DEF",
            "NestedAttribute": [
             {
              "id": "456",
              "value": "def2"
             }
           ]
         }
       ]
    }"""
    
    aaData = json.loads(s)['aaData']
    
    d = [
        {
            'Name': d1['Name'],
            'NestedAttribute_id': d2['id'],
            'NestedAttribute_value': d2['value']
        }
        for d1 in aaData
        for d2 in d1['NestedAttribute']
    ]
    
    print(pd.DataFrame.from_dict(d))
    

    Prints:

      Name NestedAttribute_id NestedAttribute_value
    0  ABC                123                  abc2
    1  DEF                456                  def2
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search