skip to Main Content

I have the following JSON structure,

{
   'total_numbers':1,
   'data':[
      {
         'col3':'2',
         'col4':[
            {
               'col5':'P',
               'col6':'H'
            }
         ],
         'col7':'2023-06-19T09:29:28.786Z',
         'col9':{
            'col10':'TEST',
            'col11':'[email protected]',
            'col12':'True',
            'col13':'999',
            'col14':'9999'
         },
         'col15':'2023-07-10T04:46:43.003Z',
         'col16':False,
         'col17':[
            {
               'col18':'S',
               'col19':'H'
            }
         ],
         'col20':True,
         'col21':{
            'col22':'sss',
            'col23':'0.0.0.0',
            'col24':'lll'
         },
         'col25':0,
         'col26':{
            'col27':{
               'col28':'Other'
            },
            'col29':'Other',
            'col30':'cccc'
         },
         'col31':{
            'col32':[
               {
                  'col33':'123456789',
                  'col34':'2023-07-14T02:52:20.166Z',
                  'col36':True,
                  'col38':{
                     'col40':[
                        {
                           'col41':'99999999999',
                        },
                        {
                           'col41':'34534543535',
                        }
                     ]
                  },
                  'col55':'878787878'
               },
               {
                  'col47':'112233445566',
                  'col48':'2023-07-24T09:26:03.425Z',
                  'col50':True,
                  'col52':{
                     'col53':[
                        {
                           'col54':'99999999999',
                        }
                     ]
                  },
                  'col55':'878787878'
               }
            ]
         }
      }
   ]
}

Is there a dynamic function I could use to convert this structure to tabular form based on traversing through the JSON?

I came across this function (marked as solution) which i think could be helpful for my case. In the example I understand there will be 6 rows based on the number of levels in the json though I don’t understand the naming of some columns. The data is there but the column names are not correct a/c to its level. Is there any update I could make to the function to achieve this?

For example, the name of the columns should be actually root_data_col4_col5 and root_data_col4_col6

enter image description here

2

Answers


  1. The following works in 2 steps:
    firstly it breaks the JSON into alternating dicts and lists – removing all dictionaries containing dictionaries as top level values.
    It then parses this new structure so that whenever there is a list (and hence the potential for duplicate keys) it creates a new row item for each element. Doing things this way results in 2 rows with data in them for ‘root_data_col31_col32_col55’ which I think is the intended behaviour.

    import pandas as pd
    
    def flatten_dict(x, key="root"):
        result = {}
        # assume we have a dict
        for k, v in x.items():
            new_key = key + "_" + k
            if isinstance(v, list):
                tmp = [flatten_dict(i, new_key) for i in v]
                result[new_key] = tmp
            elif isinstance(v, dict):
                tmp = flatten_dict(v, key=new_key)
                result.update(tmp)
            else:
                if k == "col32":
                    print(v)
                result[new_key] = v
        return result
    
    def parse_dict(flat_dict):
        tmp = {}
        result = [tmp]
        for k, v in flat_dict.items():
            if isinstance(v, list):
                for item in v:
                    result += parse_dict(item)
            else:
                tmp[k] = v
        return result
    
    z = flatten_dict(x)
    y = parse_dict(z)
    df = pd.DataFrame(y)
    
    Login or Signup to reply.
  2. I faced the same problem before and made a code that converts Json/Dict/List to a way that can be used in pandas dataframe.

    when I saw your problem, I uploaded the code to github to make you able to clone it, also I got tons of ideas after I read @Andrew Louw code, so I uploaded the package to PYPI called JsonDF.

    download the package via pip

    pip install JsonDF
    

    or from GitHub

    and I’ve no problem with contributions, so feel free to make anything.

    that’s the code based on your problem

    from JsonDf.Data import Data
    import pandas as pd
    
    data = Data(prefix='root', data=JsonData['data'])
    data.childs()
    data.flatten()
    rows = data.rows
    
    print(pd.DataFrame.from_dict(pd.json_normalize(rows), orient="columns").transpose())
    

    the output was like that :

    output image

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search