skip to Main Content

I am trying to normalize a nested JSON file in pandas.
I am unable to get the ab_id column in the beginning as observed in the current output screenshot.
Additionally, since if I remove record prefix from my code, I am seeing an error and if I add it it generates a couple of columns which are empty.

The current and expected output is attached in the screenshot below:
enter image description here

Code used currently:

df=pd.json_normalize(data=response[‘val’],record_path=[‘activity’],meta=[‘msn’,’iis’,’ica’,’iada’],errors=’ignore’, record_prefix=’_’)

JSON file:

{
   "id":"ijewiofn23441312223",
   "val":[
      {
         "ab_id":"ab_123",
         "activity":[
            {
               "msn":"acpfile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            },
            {
               "msn":"adefile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      },
      {
         "ab_id":"ab_421",
         "activity":[
            {
               "msn":"adbfile_source_conn",
               "iia":true,
               "ica":true,
               "iada":false
            },
            {
               "msn":"aile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      }
   ]
}

Can someone please help out?
Thanks so much in advance.

3

Answers


  1. import pandas as pd
    from pandas import json_normalize
    
    data = {
        "id": "ijewiofn23441312223",
        "val": [
            {
                "ab_id": "ab_123",
                "activity": [
                    {
                        "msn": "acpfile_source_conn",
                        "iia": True,
                        "ica": False,
                        "iada": False
                    },
                    {
                        "msn": "adefile_source_conn",
                        "iia": True,
                        "ica": False,
                        "iada": False
                    }
                ]
            },
            {
                "ab_id": "ab_421",
                "activity": [
                    {
                        "msn": "adbfile_source_conn",
                        "iia": True,
                        "ica": True,
                        "iada": False
                    },
                    {
                        "msn": "aile_source_conn",
                        "iia": True,
                        "ica": False,
                        "iada": False
                    }
                ]
            }
        ]
    }
    df = pd.json_normalize(data['val'], record_path=['activity'], meta=['ab_id'])
    df = df[['ab_id'] + [col for col in df.columns if col != 'ab_id']]
    df.columns = ['id'] + df.columns[1:].tolist()
    
    print(df)
    
    Login or Signup to reply.
  2. You can try to use json module and construct the DataFrame manually:

    import json
    
    with open("data.json", "r") as f_in:
        data = json.load(f_in)
    
    df = pd.DataFrame(
        [{"ab_id": v["ab_id"], **a} for v in data["val"] for a in v["activity"]]
    )
    
    print(df)
    

    Prints:

        ab_id                  msn   iia    ica   iada
    0  ab_123  acpfile_source_conn  True  False  False
    1  ab_123  adefile_source_conn  True  False  False
    2  ab_421  adbfile_source_conn  True   True  False
    3  ab_421     aile_source_conn  True  False  False
    

    Contents of data.json:

    {
       "id":"ijewiofn23441312223",
       "val":[
          {
             "ab_id":"ab_123",
             "activity":[
                {
                   "msn":"acpfile_source_conn",
                   "iia":true,
                   "ica":false,
                   "iada":false
                },
                {
                   "msn":"adefile_source_conn",
                   "iia":true,
                   "ica":false,
                   "iada":false
                }
    
             ]
          },
          {
             "ab_id":"ab_421",
             "activity":[
                {
                   "msn":"adbfile_source_conn",
                   "iia":true,
                   "ica":true,
                   "iada":false
                },
                {
                   "msn":"aile_source_conn",
                   "iia":true,
                   "ica":false,
                   "iada":false
                }
    
             ]
          }
       ]
    }
    
    Login or Signup to reply.
  3. There is two extra } in your json example. Also, this one doesn’t match the I/O images.

    But you can still try this :

    df = pd.json_normalize(response["val"], "activity", "ab_id")
    ​
    # if columns-order is important
    df = df[np.roll(df.columns, 1)]
    


    Output :

    print(df)
    
        ab_id                  msn   iia    ica   iada
    0  ab_123  acpfile_source_conn  True  False  False
    1  ab_123  adefile_source_conn  True  False  False
    2  ab_421  adbfile_source_conn  True   True  False
    3  ab_421     aile_source_conn  True  False  False
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search