Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Python Pandas- Dataframe from Nested JSON

Aakash
September 20, 2023
344 views
2 votes
3 Answers

I am trying to normalize a nested JSON file in pandas.
I am unable to get the ab_id column in the beginning as observed in the current output screenshot.
Additionally, since if I remove record prefix from my code, I am seeing an error and if I add it it generates a couple of columns which are empty.

The current and expected output is attached in the screenshot below:

Code used currently:

df=pd.json_normalize(data=response[‘val’],record_path=[‘activity’],meta=[‘msn’,’iis’,’ica’,’iada’],errors=’ignore’, record_prefix=’_’)

JSON file:

{
   "id":"ijewiofn23441312223",
   "val":[
      {
         "ab_id":"ab_123",
         "activity":[
            {
               "msn":"acpfile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            },
            {
               "msn":"adefile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      },
      {
         "ab_id":"ab_421",
         "activity":[
            {
               "msn":"adbfile_source_conn",
               "iia":true,
               "ica":true,
               "iada":false
            },
            {
               "msn":"aile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      }
   ]
}

Can someone please help out?
Thanks so much in advance.

Answers

import pandas as pd
from pandas import json_normalize

data = {
    "id": "ijewiofn23441312223",
    "val": [
        {
            "ab_id": "ab_123",
            "activity": [
                {
                    "msn": "acpfile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                },
                {
                    "msn": "adefile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                }
            ]
        },
        {
            "ab_id": "ab_421",
            "activity": [
                {
                    "msn": "adbfile_source_conn",
                    "iia": True,
                    "ica": True,
                    "iada": False
                },
                {
                    "msn": "aile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                }
            ]
        }
    ]
}
df = pd.json_normalize(data['val'], record_path=['activity'], meta=['ab_id'])
df = df[['ab_id'] + [col for col in df.columns if col != 'ab_id']]
df.columns = ['id'] + df.columns[1:].tolist()

print(df)

You can try to use json module and construct the DataFrame manually:

import json

with open("data.json", "r") as f_in:
    data = json.load(f_in)

df = pd.DataFrame(
    [{"ab_id": v["ab_id"], **a} for v in data["val"] for a in v["activity"]]
)

print(df)

Prints:

    ab_id                  msn   iia    ica   iada
0  ab_123  acpfile_source_conn  True  False  False
1  ab_123  adefile_source_conn  True  False  False
2  ab_421  adbfile_source_conn  True   True  False
3  ab_421     aile_source_conn  True  False  False

Contents of data.json:

{
   "id":"ijewiofn23441312223",
   "val":[
      {
         "ab_id":"ab_123",
         "activity":[
            {
               "msn":"acpfile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            },
            {
               "msn":"adefile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }

         ]
      },
      {
         "ab_id":"ab_421",
         "activity":[
            {
               "msn":"adbfile_source_conn",
               "iia":true,
               "ica":true,
               "iada":false
            },
            {
               "msn":"aile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }

         ]
      }
   ]
}

- Timeless
- September 20, 2023 at 10:26 pm
- 0 votes
0
There is two extra } in your json example. Also, this one doesn’t match the I/O images.

But you can still try this :
```
df = pd.json_normalize(response["val"], "activity", "ab_id")

# if columns-order is important
df = df[np.roll(df.columns, 1)]
```
Output :
```
print(df)

    ab_id                  msn   iia    ica   iada
0  ab_123  acpfile_source_conn  True  False  False
1  ab_123  adefile_source_conn  True  False  False
2  ab_421  adbfile_source_conn  True   True  False
3  ab_421     aile_source_conn  True  False  False
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.