skip to Main Content

I’ve got the following jsonFile example that I’m trying to convert to a dataframe. I’ve looked through many examples on SO but all of them have a first level key in their json data. My data only has the value as the first level.

[
  {
    "product1": [
      {
        "start": 8,
        "end": 9,
        "part_number": "150-3333-900",
        "description": "Description for 150-3333-900"
      },
      {
        "start": 8,
        "end": 9,
        "part_number": "150-4444-900",
        "description": "Description for 150-4444-900"
      }
    ]
  },
  {
    "product2": [
      {
        "start": 10,
        "end": 11,
        "part_number": "140-2222-800",
        "description": "Description for 140-2222-800"
      },
      {
        "start": 10,
        "end": 11,
        "part_number": "140-5555-800",
        "description": "Description for 140-5555-800"
      }
    ]
  }
]

I’ve tried the following and got a NotImplementedError. I thought this should work since I’m passing in a list of dictionaries.

import pandas as pd
import json

df = pd.json_normalize(jsonFile)

I tried the following and got some output but not in the format I’m looking for.

with open(jsonFile) as f:
    data = json.load(f)
df = pd.DataFrame.from_dict(data)
display(df)

And I get this:

enter image description here

I also get the same output using this line of code:

df = pd.read_json(jsonFile, orient='records')
display(df)

What I’m looking for is this:

enter image description here

Can someone point me in the right direction? I tried transposing the output that I got above so my column headers would be the beginning of each row but I couldn’t get that to work either. Thank you.

2

Answers


  1. You can achieve your desired result by iterating the dicts in your list, applying json_normalize to each one and appending the key as the product name:

    products = json.load(jsonFile)
    dfs = []
    for product in products:
        for name, details in product.items():
            dfs.append(pd.json_normalize(details).assign(pcr=name))
    
    out = pd.concat(dfs).reset_index(drop=True)
    

    Output for your sample data:

       start  end   part_number                   description       pcr
    0      8    9  150-3333-900  Description for 150-3333-900  product1
    1      8    9  150-4444-900  Description for 150-4444-900  product1
    2     10   11  140-2222-800  Description for 140-2222-800  product2
    3     10   11  140-5555-800  Description for 140-5555-800  product2
    
    Login or Signup to reply.
  2. Another possible option :

    df = pd.DataFrame([{"pcr": k, **p} for d in data for k, v in d.items() for p in v])
    

    Output :

    print(df)
    
            pcr  start  end   part_number                   description
    0  product1      8    9  150-3333-900  Description for 150-3333-900
    1  product1      8    9  150-4444-900  Description for 150-4444-900
    2  product2     10   11  140-2222-800  Description for 140-2222-800
    3  product2     10   11  140-5555-800  Description for 140-5555-800
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search