Split json data into multiple column with pandas

hungry7
January 15, 2024
163 views
0 votes
2 Answers

I have json with nested structure and each json has different keys.

                                 json_field
1 {"1": {"id": 1, "value": "value1"}, "2": {"id": 2, "value": "value2"}}
2 {"1": {"id": 1, "value": "value1"}, "3": {"id": 3, "value": "value3"}}
3 {"3": {"id": 3, "value": "value3"}, "4": {"id": 4, "value": "value4"}}

I want to split each key into separate field

   field1     field2     field3     field4
1  value1     value2       -           -
2  value1        -       value3        -
3     -          -       value3     value4

For now i have dict with "header:id" mapping

headers_mapping = {"field1": 1, "field2": 2, "field3": 3, "field4": 4}

Then i iterate through this dict and find values for each header in each row

for header, field_id in headers_mapping.items():
    df[header] = df.apply(
        lambda x: (
            x["json_field"][str(field_id)].get("value", "")
            if x["json_field"].get(str(field_id)) is not None
            else "-"
        ),
        axis=1
    )
df.drop("json_field", axis=1, inplace=True)

Please advise a more effective way to solve this problem.

Answers

You can try:

import json

# apply `json.loads` if necessary
df["json_field"] = df["json_field"].apply(json.loads)

data = []
for d in df["json_field"]:
    dct = {}
    for k, v in d.items():
        dct[f"field{k}"] = v["value"]
    data.append(dct)

out = pd.DataFrame(data)
print(out)

Prints:

   field1  field2  field3  field4
0  value1  value2     NaN     NaN
1  value1     NaN  value3     NaN
2     NaN     NaN  value3  value4

- wjandrea
- January 15, 2024 at 9:13 pm
- 0 votes
0
There are many possible solutions. Generally though, you’ll probably want to:
1. Not loop over fields; instead let Pandas split the fields for you
2. Use an actual missing value
  - But later if you want to represent it differently, you can do that, e.g. using the na_rep parameter to df.style.format
For the first step, you can look at Split / Explode a column of dictionaries into separate columns with pandas. I’ll use Lech Birek’s solution (json_normalize) then drop the "id" columns and rename the "value" columns.
```
headers_mapping = {'1': 'field1', '2': 'field2', '3': 'field3', '4': 'field4'}
(
    pd.json_normalize(df['json_field'])
    .filter(like='value')
    .rename(columns=lambda label: headers_mapping[label.rstrip('.value')])
)
```
```
   field1  field2  field3  field4
0  value1  value2     NaN     NaN
1  value1     NaN  value3     NaN
2     NaN     NaN  value3  value4
```
If you also need to sort the columns, tack this on at the end:
```
.reindex(columns=headers_mapping.values())
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.