Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Python: json_normalize gives AttributeError for list of dict values

Dcook
April 10, 2023
231 views
0 votes
2 Answers

I have a pandas dataframe where 2 columns are nested column having decimal value: df.tail(1).to_dict('list') gives this kind of data

 {'nested_col1': [array([{'key1': 'CO', 'key2': Decimal('8.940000000')}],
  dtype=object)], 'nested_col2': [array([{'key3': 'CO', 'key4': 'P14', 'key5': Decimal('8.940000000'), 'key6': None}],
  dtype=object)]}

I am trying to explode the dataframe with this:

df = (df.drop(cols, axis=1)
        .join(pd.concat(
            [pd.json_normalize(df[x].explode(), errors='ignore').applymap(
                lambda x: str(x) if isinstance(x, (int, float)) else x).add_prefix(f'{x}.') for x in
             cols],
            axis=1)))

With this I am getting below error in some cases:

    Traceback (most recent call last):
  File "data_load.py.py", line 365, in <module>
    df = prepare_data(data, transaction_id, cohort_no)
  File "data_load.py.py", line 274, in prepare_data
    df = flatten_dataframe(cols_to_explode, df)
  File "data_load.py.py", line 204, in flatten_dataframe
    df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
  File "data_load.py.py", line 204, in <listcomp>
    df1 = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
  File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in _json_normalize
    if any([isinstance(x, dict) for x in y.values()] for y in data):
  File "/project1/venv/lib/python3.6/site-packages/pandas/io/json/_normalize.py", line 270, in <genexpr>
    if any([isinstance(x, dict) for x in y.values()] for y in data):
AttributeError: 'float' object has no attribute 'values'
failed to run commands: exit status 1

anything still I am missing here or any better way to do the same?

Expected Output should be:

nested_col1.key1,nested_col1.key2 nested_col2.key3 ... like this

Answers

- Corralien
- April 10, 2023 at 4:18 pm
- 0 votes
0
It seems there is only one element for each nested column:
```
out = pd.concat([pd.json_normalize(df[x][0]).add_prefix(f'{x}.') 
                     for x in cols], axis=1)
out = out.apply(pd.to_numeric, errors='coerce').fillna(out)
print(out)

# Output
  nested_col1.key1  nested_col1.key2 nested_col2.key3 nested_col2.key4  nested_col2.key5 nested_col2.key6
0               CO              8.94               CO              P14              8.94             None
```
Note: you can replace [0] by .explode() if you have more than one record in each nested list.
Login or Signup to reply.

You can use json_normalize and concat:

cols = ['nested_col1', 'nested_col2']

out = pd.concat([pd.json_normalize(df[c].explode()) for c in cols],
                keys=cols, axis=1)
out.columns = out.columns.map('.'.join)

Output:

  nested_col1.key1 nested_col1.key2 nested_col2.key3 nested_col2.key4 nested_col2.key5 nested_col2.key6
0               CO      8.940000000               CO              P14      8.940000000             None

Please signup or login to give your own answer.

Click here to cancel reply.