I have many JSON files with the following structure:
{
"requestId": "test",
"executionDate": "2023-05-10",
"executionTime": "12:02:22",
"request": {
"fields": [{
"geometry": {
"type": "Point",
"coordinates": [-90, 41]
},
"colour": "blue",
"bean": "blaCk",
"birthday": "2021-01-01",
"arst": "111",
"arstg": "rst",
"fct": {
"start": "2011-01-10",
"end": "2012-01-10"
}
}]
},
"response": {
"results": [{
"geom": {
"type": "geo",
"coord": [-90, 41]
},
"md": {
"type": "arstat",
"mdl": "trstr",
"vs": "v0",
"cal": {
"num": 4,
"comment": "message"
},
"bean": ["blue", "green"],
"result_time": 12342
},
"predictions": [{
"date": "2004-05-19",
"day": 0,
"count": 0,
"eating_stage": "trt"
}, {
"date": "2002-01-20",
"day": 1,
"count": 0,
"eating_stage": "arstg"
}, {
"date": "2004-05-21",
"day": 2,
"count": 0,
"eating_stage": "strg"
}, {
"date": "2004-05-22",
"day": 3,
"count": 0,
"eating_stage": "rst"
}
}
}
}
The predictions part can be very deep. I want to convert this JSON to a CSV with the following structure:
requestId | executionDate | executionTime | colour | predictions_date | predictions_day | predictions_count | predictions_eating_stage |
---|---|---|---|---|---|---|---|
test | 2023-05-10 | 12:02:22 | blue | 2004-05-19 | 0 | 0 | trt |
test | 2023-05-10 | 12:02:22 | blue | 2002-01-20 | 1 | 0 | astrg |
test | 2023-05-10 | 12:02:22 | blue | 2004-05-21 | 2 | 0 | strg |
test | 2023-05-10 | 12:02:22 | blue | 2004-05-22 | 3 | 0 | rst |
I tried the following code:
flat_json = pd.DataFrame(
flatten(json_data), index=[0]
)
The code results in every data point becoming a column, and I am not sure how to pivot longer where at the ‘predictions’ key using JSON functions in Python. I recognise that at this stage I could pivot longer using column names, but I feel like there is a cleaner way to achieve this.
2
Answers
I would suggest simply extracting what you need. It seems very specific for it to be solved using specific parsing. Therefore I would start by creating two dataframes:
Renaming columns in predictions:
Joining and adding the last piece of data (colour):
Outputting:
You can also use json_normalize to extract the array of records that you want to normalize into a csv.
It is just unfortunate that there is a limitation on the meta fields as it is throwing an exception for a path that includes an array / list so the "colour" column was added separately. If the order is important, then you can rearrange the columns as needed.