Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

How to convert a nested JSON consisting of lists, ints, dicts, strs and None to pandas dataframe?

royalewithcheese
August 14, 2023
144 views
1 vote
2 Answers

I have the following JSON structure,

{
   'total_numbers':1,
   'data':[
      {
         'col3':'2',
         'col4':[
            {
               'col5':'P',
               'col6':'H'
            }
         ],
         'col7':'2023-06-19T09:29:28.786Z',
         'col9':{
            'col10':'TEST',
            'col11':'[email protected]',
            'col12':'True',
            'col13':'999',
            'col14':'9999'
         },
         'col15':'2023-07-10T04:46:43.003Z',
         'col16':False,
         'col17':[
            {
               'col18':'S',
               'col19':'H'
            }
         ],
         'col20':True,
         'col21':{
            'col22':'sss',
            'col23':'0.0.0.0',
            'col24':'lll'
         },
         'col25':0,
         'col26':{
            'col27':{
               'col28':'Other'
            },
            'col29':'Other',
            'col30':'cccc'
         },
         'col31':{
            'col32':[
               {
                  'col33':'123456789',
                  'col34':'2023-07-14T02:52:20.166Z',
                  'col36':True,
                  'col38':{
                     'col40':[
                        {
                           'col41':'99999999999',
                        },
                        {
                           'col41':'34534543535',
                        }
                     ]
                  },
                  'col55':'878787878'
               },
               {
                  'col47':'112233445566',
                  'col48':'2023-07-24T09:26:03.425Z',
                  'col50':True,
                  'col52':{
                     'col53':[
                        {
                           'col54':'99999999999',
                        }
                     ]
                  },
                  'col55':'878787878'
               }
            ]
         }
      }
   ]
}

Is there a dynamic function I could use to convert this structure to tabular form based on traversing through the JSON?

I came across this function (marked as solution) which i think could be helpful for my case. In the example I understand there will be 6 rows based on the number of levels in the json though I don’t understand the naming of some columns. The data is there but the column names are not correct a/c to its level. Is there any update I could make to the function to achieve this?

For example, the name of the columns should be actually root_data_col4_col5 and root_data_col4_col6

Answers

The following works in 2 steps:
firstly it breaks the JSON into alternating dicts and lists – removing all dictionaries containing dictionaries as top level values.
It then parses this new structure so that whenever there is a list (and hence the potential for duplicate keys) it creates a new row item for each element. Doing things this way results in 2 rows with data in them for ‘root_data_col31_col32_col55’ which I think is the intended behaviour.

import pandas as pd

def flatten_dict(x, key="root"):
    result = {}
    # assume we have a dict
    for k, v in x.items():
        new_key = key + "_" + k
        if isinstance(v, list):
            tmp = [flatten_dict(i, new_key) for i in v]
            result[new_key] = tmp
        elif isinstance(v, dict):
            tmp = flatten_dict(v, key=new_key)
            result.update(tmp)
        else:
            if k == "col32":
                print(v)
            result[new_key] = v
    return result

def parse_dict(flat_dict):
    tmp = {}
    result = [tmp]
    for k, v in flat_dict.items():
        if isinstance(v, list):
            for item in v:
                result += parse_dict(item)
        else:
            tmp[k] = v
    return result

z = flatten_dict(x)
y = parse_dict(z)
df = pd.DataFrame(y)

- OsamaElsherif
- August 14, 2023 at 11:40 pm
- 0 votes
0
I faced the same problem before and made a code that converts Json/Dict/List to a way that can be used in pandas dataframe.

when I saw your problem, I uploaded the code to github to make you able to clone it, also I got tons of ideas after I read @Andrew Louw code, so I uploaded the package to PYPI called JsonDF.

download the package via pip
```
pip install JsonDF
```
or from GitHub

and I’ve no problem with contributions, so feel free to make anything.

that’s the code based on your problem
```
from JsonDf.Data import Data
import pandas as pd

data = Data(prefix='root', data=JsonData['data'])
data.childs()
data.flatten()
rows = data.rows

print(pd.DataFrame.from_dict(pd.json_normalize(rows), orient="columns").transpose())
```
the output was like that :

output image
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.