I have a data structure which is a very complex/nested JSON.
After reading other solutions on SO, I am comfortable with the fact that I may need to make multiple DFs and merge them together (e.g json_normalize with multiple record paths).
{
"topLevelIdentifier": "a86507",
"activities": [
{
"activityId": "1f13d848",
"taskList": [
{
"taskIdentifier": "c6ba-1012",
"taskTime": {
"value": 0,
"unit": "SECONDS"
},
"taskLocationIdentifier": "3438"
},
{
"taskIdentifier": "cc48-07bf",
"taskTime": {
"value": 0,
"unit": "SECONDS"
},
"taskLocationIdentifier": "4b32"
},
{
"taskIdentifier": "35b896",
"taskTime": {
"value": 0,
"unit": "SECONDS"
},
"taskLocationIdentifier": "7bcca"
}
],
"sectionB": {
"someCapacityA": {
"totalVolume": {
"value": 8415,
"unit": "CU_CM"
},
"totalWeight": {
"value": 1059.0,
"unit": "GRAMS"
}
},
"someCapacityB": {
"totalVolume": {
"value": 0.0,
"unit": "CU_CM"
},
"totalWeight": {
"value": 0.0,
"unit": "GRAMS"
}
}
},
"sectionC": {....},
"sectionD": {....},
"sectonE":{...}
}]}
I can create a base for the merges by doing this:
with open('my_file.json') as simulation_file:
simulation_data = json.load(simulation_file)
df_base = pd.json_normalize(simulation_data, 'activities',['topLevelIdentifier'])
Then normalizing the next part isn’t too bad:
df_taskList = pd.json_normalize(simulation_data['activities'],
'taskList',['activityId'])
It’s normalizing the next section that has me stuck. Here’s what I’m trying but is throwing an error:
df_sectionB = pd.json_normalize(simulation_data['activities'],
'activityId',['taskList',['taskIdentifier',['taskTime',['unit', 'value']], 'taskLocationIdentifier']])
I’m basically trying to flatten out sectionB
such that it has the activityId
so that I can merge all the pieces together.
If there’s a simpler way to flatten out to the deepest layer in a complex format like this, I’m open to alternative approaches!
2
Answers
You can try flatten_json. I had to remove Sections C, D, E from your example, so it’ s unclear if this will work as you need it to. Refer to documentation for more information.
set your dictionary equal to test_json
You could add
.explode("taskList")
to the result of the first normalize.You can then normalize the
taskList
column, copy the base index, and concat the results.