I have a json that is three levels deep.
I want to flatten it into a dataframe that has five columns.
id | name | code | level | parent_id |
---|
So:
The part I struggle with is that I can extract each nested item, but I can’t find an elegant way to keep the "parent_id". There’s got to be more elegant ways of doing this. Any pointers appreciated.
source = response.json()
print ('there are ' + str(len(source)) + ' records')
df_L1 = df
df_L2 = json_normalize(df_L1['subLevelCategories'][0])
df_L3 = json_normalize(df_L2['subLevelCategories'][0]) #store for later !!!!!
df_L2_wrapper = df_L2['id']
df_L3_wrapper = df_L3['id']
df_L2_wrapper.name = 'parent_id'
df_L3_wrapper.name = 'parent_id'
df_L1 = df_L1.head(5)
df_L2 = df_L2.head(5)
df_L3 = df_L3.head(5)
df_L3_wrapper = df_L3_wrapper.head(5)
df_L2_wrapper = df_L2_wrapper.head(5)
# Build of df_L1
df_L1 = df_L1.drop(['subLevelCategories'], axis=1)
df_L1['parentid']=0
# Build of df_L2
df_L2 = df_L2.drop(['name','code','level'], axis=1)
# Rename the Series
df_L2 = json_normalize(df_L2['subLevelCategories'][0])
# Concatenate the DataFrame and the renamed Series
df_L2 = pd.concat([df_L2, df_L2_wrapper], axis=1)
df_L2 = df_L2.drop(['subLevelCategories'], axis=1)
# ////// L2 is built.
# Build of df_L3
df_L3 = df_L3.drop(['subLevelCategories'], axis=1)
df_L3 = pd.concat([df_L3, df_L3_wrapper], axis=1)
df_combined = pd.concat([df_L1, df_L2, df_L3], ignore_index=True)
EDIT: The sample has been corrected by enclosing it with the ‘[‘ and ‘]’
source originates from
response = requests.get(url, headers=headers)
source = response.json()
The sample JSON is as follows:
[
{
"id": 3372,
"name": "Archive",
"code": null,
"level": 1,
"subLevelCategories": [
{
"id": 16708,
"name": ".....",
"code": null,
"level": 2,
"subLevelCategories": [
{
"id": 16727,
"name": ".........",
"code": null,
"level": 3,
"subLevelCategories": null
},
{
"id": 16726,
"name": "........",
"code": null,
"level": 3,
"subLevelCategories": null
}
]
},
{
"id": 16701,
"name": ".......",
"code": null,
"level": 2,
"subLevelCategories": [
{
"id": 16782,
"name": "......",
"code": null,
"level": 3,
"subLevelCategories": null
},
{
"id": 16785,
"name": "......",
"code": null,
"level": 3,
"subLevelCategories": null
}
]
}
]
}
]
2
Answers
Used recursive function to handle the nested Json data,it stores the current
id
,name
,code
,level
, andparent_id
. The parent_id is passed down as a parameter so that we can track the parent-child relationship as we go deeper into the hierarchy.EDIT : check added whether
subLevelCategories
is a list and non-emptyimport pandas as pd
from pandas import json_normalize
Output
Apply a recursive function to the dataframe since json structure is repetitive.