skip to Main Content

I have a python pandas dataframe, which should be converted to a nested JSON (based on the entry found in the column ‘laterality’).

My dataframe looks like this:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 1, 1, 2],
                   'laterality': ['L', 'L', 'R', 'R'],
                   'image': ['im1', 'im2', 'im3', 'im4'],
                   'number': [5, 5, 5, 6] })

The desired JSON output then is:

[
  {
    "id": 1,
    "number": 5
    "images L": ["im1", "im2"],
    "images R": ["im3"]    
  },
  {
    "id": 2,
    "number": 6,
    "images R": ["im4"]    
  },
...
]

What I tried:
I tried to group records based on the .to_dict() method like below. Unfortunately this didn’t solve my problem yet. Since I’m currently stuck, I am eager to learn from your advise. I’m looking for a push in the right direction.

json1 = (df1.groupby(['id','number'])
       .apply(lambda x: x[['laterality','image']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'image'})
       .to_json(orient='records'))

Some related topics I looked at:

more pythonic way to format a JSON string from a list of tuples

Convert Pandas Dataframe to nested JSON

Convert Pandas DataFrame to JSON format

2

Answers


  1. You can group by ['id', 'number', 'laterality'] and aggregate with list. After that pivot to get images as columns:

    import pandas as pd
    
    df = pd.DataFrame({'id': [1, 1, 1, 2],
                       'laterality': ['L', 'L', 'R', 'R'],
                       'image': ['im1', 'im2', 'im3', 'im4'],
                       'number': [5, 5, 5, 6] })
    
    df_out = df.pivot_table(columns='laterality', index=['id','number'], values='image', aggfunc=list)
    
    df_out.columns = "images " + df_out.columns
    
    df_out.reset_index().to_json(orient='records')
    

    Edit: replacing groupby + pivot by pivot_table with aggfunc

    Output:

    [{"id":1,"number":5,"images L":["im1","im2"],"images R":["im3"]},{"id":2,"number":6,"images L":null,"images R":["im4"]}]
    
    Login or Signup to reply.
  2. It is simple and faster to just iterate the grouped dataframe

    out = []
    for (index, num), group in df1.groupby(['id', 'number']):
        out.append({
            "id": index,
            "number": num,
            "images L": group.loc[group.laterality == 'L', 'image'].tolist(),
            "images R": group.loc[group.laterality == 'R', 'image'].tolist()
        })
    

    Output

    [{'id': 1, 'number': 5, 'images L': ['im1', 'im2'], 'images R': ['im3']},
     {'id': 2, 'number': 6, 'images L': [], 'images R': ['im4']}]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search