I have a problem. I have a nested JSON
file:
json_data = '''
{
"appVersion": "",
"device": {
"model": ""
},
"bef": {
"catalog": ""
},
"data": [
{
"timestamp": "",
"label": "",
"category": ""
}
]
}
I would like to extract all data and if it is nested I would like it to be separated with a _
.
I have tried to normalise the nested JSON file. I use json_normalise
for this.
Unfortunately, the desired output is not what I want and need.
Furthermore, I want that there can be any possible number of nested values, so I tried to solve it with a loop.
How can I produce the desired output?
import pandas as pd
import json
json_data = '''
{
"appVersion": "0.0.3",
"device": {
"model": "Lenovo"
},
"bef": {
"catalog": "Manual"
},
"data": [
{
"timestamp": "2024-04-24 12:08:02.415077",
"label": "zuf",
"category": "50"
}
]
}
'''
parsed_json = json.loads(json_data)
def extract_metadata(json_data):
metadata = {}
for key, value in json_data.items():
if isinstance(value, dict):
for k, v in value.items():
metadata[f'{key}_{k}'] = v
else:
metadata[key] = value
return metadata
meta_data = extract_metadata(parsed_json)
df_main = pd.json_normalize(parsed_json['data'], sep='_')
df_meta = pd.DataFrame([meta_data])
df = pd.concat([df_main, df_meta], axis=1)
print(df)
What I got
timestamp label category appVersion device_model
0 2024-04-24 12:08:02.415077 zuf 50 0.0.3 Lenovo
bef_catalog data
0 Manual [{'timestamp': '2024-04-24 12:08:02.415077', '...
What I want
appVersion device_model bef_catalog data_timestamp data_label data_category
0.0.3 Lenovo Manual 2024-04-24 12:08:02.415 zuf 50
3
Answers
}
”’
You can first flatten your dictionary like this:
In your code you missed the
list
type. I added the enumerator since it is possible to have multiple dictionaries in alist
. If you are sure that is is always max 1 element per list you can remove thei
. Or include a check or something to see if the list has length 1.To convert it to pandas:
The output of the function:
The problems is that you are trying to normalize the data field which is a list of dictionaries, and then concatenate it with the matadata. Instead, iterate over the data list and for each dictionary in it, merge the metadata and append it to your dataframe: