skip to Main Content

I have a JSON that I download from a website that has multiple nested dictionaries inside the main list. This is a very simplified version of it.

[
    {
        "id": 1,
        "attributes": {
            "autoNumber": 1,
            "make": "Ford",
            "model": "F150",
            "trim": "Lariat"
            },
            "engine": {
                "data": [
                    {
                        "id": 1,
                        "attributes": {
                            "engine": "5.0l v8 ",
                            "horsePower": "400",
                            "torque": "410"
                        }
                    },
                    {
                        "id": 2,
                        "attributes": {
                            "engine": "2.7l v6 ",
                            "horsePower": "325",
                            "torque": "300"
                        }
                    }
                ]
            }
    }
]

This is the code I use to convert to xlsx

import json
import pandas as pd

# Load JSON data
with open('data.json') as json_file:
    data = json.load(json_file)

# Normalize JSON data to tabular format
df = pd.json_normalize(data)

df.to_excel('data.xlsx', index=False)

I want all of the various attributes to be in its own column. The engine data may be zero to two entries and should go into one each. The issue I am having is that it is putting all of the engine data into one column.

column f = [{'id': 1, 'attributes': {'engine': '5.0l v8 ', 'horsePower': '400', 'torque': '410'}}, {'id': 2, 'attributes': {'engine': '2.7l v6 ', 'horsePower': '325', 'torque': '300'}}]

I tried dataframe explode, but it created additional rows instead of columns.

2

Answers


  1. I suggest to create two dataframes from the data and then merge it together based on the id, e.g.:

    import pandas as pd
    
    data = [
        {
            "id": 1,
            "attributes": {
                "autoNumber": 1,
                "make": "Ford",
                "model": "F150",
                "trim": "Lariat",
            },
            "engine": {
                "data": [
                    {
                        "id": 1,
                        "attributes": {
                            "engine": "5.0l v8 ",
                            "horsePower": "400",
                            "torque": "410",
                        },
                    },
                    {
                        "id": 2,
                        "attributes": {
                            "engine": "2.7l v6 ",
                            "horsePower": "325",
                            "torque": "300",
                        },
                    },
                ]
            },
        }
    ]
    
    df1 = []
    for d in data:
        df1.append({"id": d["id"], **d["attributes"]})
    
    df2 = []
    for d in data:
        for engine_data in d["engine"]["data"]:
            df2.append(
                {
                    "id": d["id"],
                    "attribute_id": engine_data["id"],
                    **engine_data["attributes"],
                }
            )
    
    df1 = pd.DataFrame(df1)
    df2 = pd.DataFrame(df2)
    
    df_out = df1.merge(df2, on="id", how="outer")
    print(df_out)
    

    Prints:

       id  autoNumber  make model    trim  attribute_id    engine horsePower torque
    0   1           1  Ford  F150  Lariat             1  5.0l v8         400    410
    1   1           1  Ford  F150  Lariat             2  2.7l v6         325    300
    
    Login or Signup to reply.
  2. You can use pd.json_normalize()

    Code:

    import json
    
    import pandas as pd
    
    
    with open("data.json") as json_file:
        data = json.load(fp=json_file)
    
    df = pd.json_normalize(data=data,
                           record_path=["engine", "data"],
                           meta=["attributes", "id"],
                           meta_prefix="meta_"
                           )
    df = (pd
          .concat(objs=[df, df.pop(item="meta_attributes").apply(func=pd.Series)], axis=1)
          .rename(columns=lambda x: x.split(".")[-1])
          )
    print(df)
    

    Output:

      id    engine horsePower torque meta_id  autoNumber  make model    trim
    0   1  5.0l v8         400    410       1           1  Ford  F150  Lariat
    1   2  2.7l v6         325    300       1           1  Ford  F150  Lariat
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search