skip to Main Content

I have a json that is three levels deep.
I want to flatten it into a dataframe that has five columns.

id name code level parent_id

So:
The part I struggle with is that I can extract each nested item, but I can’t find an elegant way to keep the "parent_id". There’s got to be more elegant ways of doing this. Any pointers appreciated.

source = response.json()
print ('there are ' + str(len(source)) + ' records')    

df_L1 = df
df_L2 = json_normalize(df_L1['subLevelCategories'][0])   
df_L3 = json_normalize(df_L2['subLevelCategories'][0])   #store for later   !!!!!

df_L2_wrapper = df_L2['id']
df_L3_wrapper = df_L3['id']
df_L2_wrapper.name = 'parent_id'
df_L3_wrapper.name = 'parent_id'
    
df_L1 = df_L1.head(5)
df_L2 = df_L2.head(5)
df_L3 = df_L3.head(5)
df_L3_wrapper = df_L3_wrapper.head(5)
df_L2_wrapper = df_L2_wrapper.head(5)

# Build of df_L1
df_L1 = df_L1.drop(['subLevelCategories'], axis=1) 
df_L1['parentid']=0

# Build of df_L2
df_L2 = df_L2.drop(['name','code','level'], axis=1) 
 


# Rename the Series
df_L2 = json_normalize(df_L2['subLevelCategories'][0])
# Concatenate the DataFrame and the renamed Series
df_L2 = pd.concat([df_L2, df_L2_wrapper], axis=1)
df_L2 = df_L2.drop(['subLevelCategories'], axis=1) 
# //////    L2 is built. 

# Build of df_L3
df_L3 = df_L3.drop(['subLevelCategories'], axis=1) 
df_L3 = pd.concat([df_L3, df_L3_wrapper], axis=1)

df_combined = pd.concat([df_L1, df_L2, df_L3], ignore_index=True)

EDIT: The sample has been corrected by enclosing it with the ‘[‘ and ‘]’

source originates from

response = requests.get(url, headers=headers)
source = response.json()

The sample JSON is as follows:

[
{
  "id": 3372,
  "name": "Archive",
  "code": null,
  "level": 1,
  "subLevelCategories": [
    {
      "id": 16708,
      "name": ".....",
      "code": null,
      "level": 2,
      "subLevelCategories": [
        {
          "id": 16727,
          "name": ".........",
          "code": null,
          "level": 3,
          "subLevelCategories": null
        },
        {
          "id": 16726,
          "name": "........",
          "code": null,
          "level": 3,
          "subLevelCategories": null
        }
      ]
    },
    {
      "id": 16701,
      "name": ".......",
      "code": null,
      "level": 2,
      "subLevelCategories": [
        {
          "id": 16782,
          "name": "......",
          "code": null,
          "level": 3,
          "subLevelCategories": null
        },
        {
          "id": 16785,
          "name": "......",
          "code": null,
          "level": 3,
          "subLevelCategories": null
        }
      ]
    }
  ]
}
]

2

Answers


  1. Used recursive function to handle the nested Json data,it stores the current id, name, code, level, and parent_id. The parent_id is passed down as a parameter so that we can track the parent-child relationship as we go deeper into the hierarchy.

    EDIT : check added whether subLevelCategories is a list and non-empty
    import pandas as pd
    from pandas import json_normalize

    source = { } # your json data
    
    def flatten_json(data, parent_id=0):
       # Flattens the JSON data recursively while adding parent_id for each level.
     
        flat_data = {
                'id': data['id'],
                'name': data['name'],
                'code': data['code'],
                'level': data['level'],
                'parent_id': parent_id
            }
        
        flat_records = [flat_data]
        
        subcategories = data.get('subLevelCategories', [])
        
        if isinstance(subcategories, list) and subcategories:
            for sub_category in subcategories:
                flat_records.extend(flatten_json(sub_category, parent_id=data['id']))
        
        return flat_records
        flattened_data = flatten_json(source)
        df = pd.DataFrame(flattened_data)
    
    print(df)
    

    Output

          id       name  code  level  parent_id
    0   3372    Archive  None      1          0
    1  16708      .....  None      2       3372
    2  16727  .........  None      3      16708
    3  16726   ........  None      3      16708
    4  16701    .......  None      2       3372
    5  16782     ......  None      3      16701
    6  16785     ......  None      3      16701
    
    Login or Signup to reply.
  2. Apply a recursive function to the dataframe since json structure is repetitive.

    import pandas as pd
    
    def parse_sub_level(data, sub):
        row = [sub[e] for e in ['id', 'name', 'code', 'level']]
        row.append(int(next(x[0] for x in reversed(data) if x[3] < sub['level'])))
        data.append(row)
        if sub['subLevelCategories'] is not None:
            for lev in sub['subLevelCategories']:
                parse_sub_level(data, lev)
    
    data = []
    
    df = pd.read_json('/home/lmc/tmp/test.json')
    
    # populate root element
    data.append(df.loc[0, ['id', 'name', 'code', 'level']].values.flatten().tolist())
    data[0].append(0)
    
    df.apply(lambda x: parse_sub_level(data, x['subLevelCategories']), axis=1)
    
    out = pd.DataFrame(data, columns=['id', 'name', 'code', 'level', 'parent_id'])
    print(out)
    
          id     name  code  level  parent_id
    0   3372  Archive   NaN      1          0
    1  16708       ..   NaN      2       3372
    2  16727      ...   NaN      3      16708
    3  16726      ...   NaN      3      16708
    4  16701       ..   NaN      2       3372
    5  16782      ...   NaN      3      16701
    6  16785      ...   NaN      3      16701
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search