Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Convert nested JSON with no first level keys into Pandas dataframe

JoeG
May 23, 2023
269 views
0 votes
2 Answers

I’ve got the following jsonFile example that I’m trying to convert to a dataframe. I’ve looked through many examples on SO but all of them have a first level key in their json data. My data only has the value as the first level.

[
  {
    "product1": [
      {
        "start": 8,
        "end": 9,
        "part_number": "150-3333-900",
        "description": "Description for 150-3333-900"
      },
      {
        "start": 8,
        "end": 9,
        "part_number": "150-4444-900",
        "description": "Description for 150-4444-900"
      }
    ]
  },
  {
    "product2": [
      {
        "start": 10,
        "end": 11,
        "part_number": "140-2222-800",
        "description": "Description for 140-2222-800"
      },
      {
        "start": 10,
        "end": 11,
        "part_number": "140-5555-800",
        "description": "Description for 140-5555-800"
      }
    ]
  }
]

I’ve tried the following and got a NotImplementedError. I thought this should work since I’m passing in a list of dictionaries.

import pandas as pd
import json

df = pd.json_normalize(jsonFile)

I tried the following and got some output but not in the format I’m looking for.

with open(jsonFile) as f:
    data = json.load(f)
df = pd.DataFrame.from_dict(data)
display(df)

And I get this:

I also get the same output using this line of code:

df = pd.read_json(jsonFile, orient='records')
display(df)

What I’m looking for is this:

Can someone point me in the right direction? I tried transposing the output that I got above so my column headers would be the beginning of each row but I couldn’t get that to work either. Thank you.

Answers

You can achieve your desired result by iterating the dicts in your list, applying json_normalize to each one and appending the key as the product name:

products = json.load(jsonFile)
dfs = []
for product in products:
    for name, details in product.items():
        dfs.append(pd.json_normalize(details).assign(pcr=name))

out = pd.concat(dfs).reset_index(drop=True)

Output for your sample data:

   start  end   part_number                   description       pcr
0      8    9  150-3333-900  Description for 150-3333-900  product1
1      8    9  150-4444-900  Description for 150-4444-900  product1
2     10   11  140-2222-800  Description for 140-2222-800  product2
3     10   11  140-5555-800  Description for 140-5555-800  product2

Another possible option :

df = pd.DataFrame([{"pcr": k, **p} for d in data for k, v in d.items() for p in v])

Output :

print(df)

        pcr  start  end   part_number                   description
0  product1      8    9  150-3333-900  Description for 150-3333-900
1  product1      8    9  150-4444-900  Description for 150-4444-900
2  product2     10   11  140-2222-800  Description for 140-2222-800
3  product2     10   11  140-5555-800  Description for 140-5555-800

Please signup or login to give your own answer.

Click here to cancel reply.