I have a python pandas dataframe, which should be converted to a nested JSON (based on the entry found in the column ‘laterality’).
My dataframe looks like this:
import pandas as pd
df1 = pd.DataFrame({'id': [1, 1, 1, 2],
'laterality': ['L', 'L', 'R', 'R'],
'image': ['im1', 'im2', 'im3', 'im4'],
'number': [5, 5, 5, 6] })
The desired JSON output then is:
[
{
"id": 1,
"number": 5
"images L": ["im1", "im2"],
"images R": ["im3"]
},
{
"id": 2,
"number": 6,
"images R": ["im4"]
},
...
]
What I tried:
I tried to group records based on the .to_dict() method like below. Unfortunately this didn’t solve my problem yet. Since I’m currently stuck, I am eager to learn from your advise. I’m looking for a push in the right direction.
json1 = (df1.groupby(['id','number'])
.apply(lambda x: x[['laterality','image']].to_dict('records'))
.reset_index()
.rename(columns={0:'image'})
.to_json(orient='records'))
Some related topics I looked at:
more pythonic way to format a JSON string from a list of tuples
2
Answers
You can group by
['id', 'number', 'laterality']
and aggregate withlist
. After that pivot to get images as columns:Edit: replacing
groupby
+pivot
bypivot_table
withaggfunc
Output:
It is simple and faster to just iterate the grouped dataframe
Output