I’m struggling to normalise nested attributes in a JSON using python data frames.
My JSON structure is:
{
"aaData": [
{
"Name": "ABC",
"NestedAttribute": [
{
"id": "123",
"value": "abc2"
}
]
},
{
"Name": "DEF",
"NestedAttribute": [
{
"id": "456",
"value": "def2"
}
]
}
]
}
I’m reading in the JSON and converting to dataframe with the following:
after reading in the JSON and converting to a dataframe, I am extracting the records with:
df = pf.json_normlize(df['aaData'])
This gives me two columns, ‘Name’ and ‘NestedAttribute’. I’d like to now split the ‘NestedAttribute’ into the 2 columns (NestedAttribute_id and NestedAttribute_Value).
I’ve tried using normalise a second time, but it’s not working as I think it thinks it’s a nested array due to be enclosed by the []
I know I’m missing something obvious, so hoping someone can put me out of my misery and give me a point.
The output I want to get to is:
Name | NestedAttribute_id | NestedAttribute_value |
---|---|---|
ABC | 123 | abc2 |
DEF | 456 | def2 |
Thanks for any help.
2
Answers
If you look at the documentation for
json_normalize
, there are examples of how to do exactly what you want. The records you want are in theNestedAttribute
key; additional metadata for each record is theName
key.This returns:
You could convert the input JSON to a dictionary and then use a list comprehension to create a modified dictionary that is input to
pandasDataFrame.from_dict
:Prints: