Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

How to deal with JSON and nested JSON inside a DataFrame columns into new columns in Python Pandas?

dingaro
December 23, 2022
256 views
0 votes
2 Answers

I have DataFrame like below:

data type:

COL1 – float
COL2 – int
COL3 – int
COL4 – float
COL5 – float
COL6 – object
COL7 – object

Source code:

a = pd.DataFrame()
a["COL1"] = [0.0, 800.0]
a["COL2"] = [2, 3]
a["COL3"] = [123, 444]
a["COL4"] = [1500.0, 1600.0]
a["COL5"] = [700.0, 850.0]
a["COL6"] = ['{"account": {"sector": 2, "other": 15}}', np.nan]
a["COL7"] = ['{"value": "ab"}', np.nan]

COL6 and COL7 contain JSON, COL6 contains nested JSON.
Furthermore there could be missings both in COL6 and COL7.
And I need to convert values from COL6 and COL7 to "normal" form, however I can not even imagine how to convert COL6 (nested JSON) to DataFrame form of column with value

Desire output:

In terms of outpur for COL7 it is like below, however I can not even imagine how should look output for COL6 ?

COL1  | COL2 | COL3 | COL4   | COL5  | value |
------|------|------|--------|-------|-------|
0.0   | 2    | 123  | 1500.0 | 700.0 | abc   |
800.0 | 3    | 444  | 1600.0 | 850.0 | NaN   |

How can I do that in Python Pandas ?

The following solution does not work: pd.json_normalize(df['COL7'].apply(ast.literal_eval)), ERROR: ValueError: malformed node or string: nan

Source code (be aware that if I read it in Pandas there is also NaN):

{'COL1': [0.0, 0.0, 0.0],
 'COL2': [2, 0, 33],
 'COL3': [2162561990, 2167912785, 599119703],
 'COL4': [1500.0, 500.0, 3500.0],
 'COL5': [750.0, 0.0, 3500.0],
 'COL6': ['{"account": {"sector": 4, "other": 10}
, "account_2": {"sector": 0, "other": 0}
, "account_3": {"sector": 6, "other": 8}}'],
 'COL7': ['{"value": "cc"
, "value_2": 15.58
, "value_3": 646}']}

Answers

You can try something as below; where you will first try to convert json from nested to flat,

more the error you were receiving that is because of nan values, so avoid that I have you if/else condition.

Code:

import pandas as pd
import ast
import json 

for col in ['COL6', 'COL7']:
    a[col] = a[col].apply(lambda x: '' if pd.isnull(x) else list(pd.json_normalize(ast.literal_eval(x)).T.to_dict().values())[0])
a

#output

   COL1  COL2   COL3    COL4    COL5    COL6                           COL7
0   0.0     2   123 1500.0     700.0    {'account.sector': 2, 'account.other': 15}  ab
1   800.0   3   444 1600.0     850.0

after flatting, I am trying to split that column and concat with our actual data.

a = pd.concat([a, a['COL6'].apply(pd.Series).drop(0,axis=1)]], axis=1)
a.columns = a.columns.str.split('.').str[-1]

Output: you will get all columns, drop the unnecessary ones.

                sector          other
0                 2.0            15.0
1                 NaN             NaN

Just for the fun of it, this might be a solution as well. By restructuring the data to dictionaries in a different format:

import pandas as pd
import json

data = '''{
    "COL1": [0.0, 0.0, 0.0],
    "COL2": [2, 0, 33],
    "COL3": [2162561990, 2167912785, 599119703],
    "COL4": [1500.0, 500.0, 3500.0],
    "COL5": [750.0, 0.0, 3500.0],
    "COL6": [
        {
            "account": {"sector": 4, "other": 10}, 
            "account_2": {"sector": 0, "other": 0},
            "account_3": {"sector": 6, "other": 8}
        }
    ],
    "COL7": [
        {
            "value": "cc",
            "value_2": 15.58, 
            "value_3": 646}
        ]
    }

'''

d = json.loads(data)

d1, d2, d3 = {}, {}, {}
cols = []

for k in list(d.keys()):
    if not isinstance(d[k][0], dict):
        d1[k] = d[k][0]
        d2[k] = d[k][1]
        d3[k] = d[k][2]

    else:
        cols = list(d[k][0].keys())
        d1[cols[0]] = d[k][0][cols[0]]
        d2[cols[1]] = d[k][0][cols[1]]
        d3[cols[2]] = d[k][0][cols[2]]

df = pd.concat([pd.json_normalize(d1), pd.json_normalize(d2), pd.json_normalize(d3)], ignore_index = True))

yields:

       COL1  COL2        COL3    COL4    COL5 value  account.sector  account.other  value_2  account_2.sector  account_2.other  value_3  account_3.sector  account_3.other
    0   0.0     2  2162561990  1500.0   750.0    cc             4.0           10.0      NaN               NaN              NaN      NaN               NaN              NaN
    1   0.0     0  2167912785   500.0     0.0   NaN             NaN            NaN    15.58               0.0              0.0      NaN               NaN              NaN
    2   0.0    33   599119703  3500.0  3500.0   NaN             NaN            NaN      NaN               NaN              NaN    646.0               6.0              8.0

Please signup or login to give your own answer.

Click here to cancel reply.