Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Dataframe to nested JSON (grouped by row entry)

Jeromekem
January 12, 2023
253 views
1 vote
2 Answers

I have a python pandas dataframe, which should be converted to a nested JSON (based on the entry found in the column ‘laterality’).

My dataframe looks like this:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 1, 1, 2],
                   'laterality': ['L', 'L', 'R', 'R'],
                   'image': ['im1', 'im2', 'im3', 'im4'],
                   'number': [5, 5, 5, 6] })

The desired JSON output then is:

[
  {
    "id": 1,
    "number": 5
    "images L": ["im1", "im2"],
    "images R": ["im3"]    
  },
  {
    "id": 2,
    "number": 6,
    "images R": ["im4"]    
  },
...
]

What I tried:
I tried to group records based on the .to_dict() method like below. Unfortunately this didn’t solve my problem yet. Since I’m currently stuck, I am eager to learn from your advise. I’m looking for a push in the right direction.

json1 = (df1.groupby(['id','number'])
       .apply(lambda x: x[['laterality','image']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'image'})
       .to_json(orient='records'))

Some related topics I looked at:

more pythonic way to format a JSON string from a list of tuples

Convert Pandas Dataframe to nested JSON

Convert Pandas DataFrame to JSON format

Answers

You can group by ['id', 'number', 'laterality'] and aggregate with list. After that pivot to get images as columns:

import pandas as pd

df = pd.DataFrame({'id': [1, 1, 1, 2],
                   'laterality': ['L', 'L', 'R', 'R'],
                   'image': ['im1', 'im2', 'im3', 'im4'],
                   'number': [5, 5, 5, 6] })

df_out = df.pivot_table(columns='laterality', index=['id','number'], values='image', aggfunc=list)

df_out.columns = "images " + df_out.columns

df_out.reset_index().to_json(orient='records')

Edit: replacing groupby + pivot by pivot_table with aggfunc

Output:

[{"id":1,"number":5,"images L":["im1","im2"],"images R":["im3"]},{"id":2,"number":6,"images L":null,"images R":["im4"]}]

It is simple and faster to just iterate the grouped dataframe

out = []
for (index, num), group in df1.groupby(['id', 'number']):
    out.append({
        "id": index,
        "number": num,
        "images L": group.loc[group.laterality == 'L', 'image'].tolist(),
        "images R": group.loc[group.laterality == 'R', 'image'].tolist()
    })

Output

[{'id': 1, 'number': 5, 'images L': ['im1', 'im2'], 'images R': ['im3']},
 {'id': 2, 'number': 6, 'images L': [], 'images R': ['im4']}]

Please signup or login to give your own answer.

Click here to cancel reply.