skip to Main Content

I have json with nested structure and each json has different keys.

                                 json_field
1 {"1": {"id": 1, "value": "value1"}, "2": {"id": 2, "value": "value2"}}
2 {"1": {"id": 1, "value": "value1"}, "3": {"id": 3, "value": "value3"}}
3 {"3": {"id": 3, "value": "value3"}, "4": {"id": 4, "value": "value4"}}

I want to split each key into separate field

   field1     field2     field3     field4
1  value1     value2       -           -
2  value1        -       value3        -
3     -          -       value3     value4

For now i have dict with "header:id" mapping

headers_mapping = {"field1": 1, "field2": 2, "field3": 3, "field4": 4}

Then i iterate through this dict and find values for each header in each row

for header, field_id in headers_mapping.items():
    df[header] = df.apply(
        lambda x: (
            x["json_field"][str(field_id)].get("value", "")
            if x["json_field"].get(str(field_id)) is not None
            else "-"
        ),
        axis=1
    )
df.drop("json_field", axis=1, inplace=True)

Please advise a more effective way to solve this problem.

2

Answers


  1. You can try:

    import json
    
    # apply `json.loads` if necessary
    df["json_field"] = df["json_field"].apply(json.loads)
    
    data = []
    for d in df["json_field"]:
        dct = {}
        for k, v in d.items():
            dct[f"field{k}"] = v["value"]
        data.append(dct)
    
    out = pd.DataFrame(data)
    print(out)
    

    Prints:

       field1  field2  field3  field4
    0  value1  value2     NaN     NaN
    1  value1     NaN  value3     NaN
    2     NaN     NaN  value3  value4
    
    Login or Signup to reply.
  2. There are many possible solutions. Generally though, you’ll probably want to:

    1. Not loop over fields; instead let Pandas split the fields for you
    2. Use an actual missing value
      • But later if you want to represent it differently, you can do that, e.g. using the na_rep parameter to df.style.format

    For the first step, you can look at Split / Explode a column of dictionaries into separate columns with pandas. I’ll use Lech Birek’s solution (json_normalize) then drop the "id" columns and rename the "value" columns.

    headers_mapping = {'1': 'field1', '2': 'field2', '3': 'field3', '4': 'field4'}
    (
        pd.json_normalize(df['json_field'])
        .filter(like='value')
        .rename(columns=lambda label: headers_mapping[label.rstrip('.value')])
    )
    
       field1  field2  field3  field4
    0  value1  value2     NaN     NaN
    1  value1     NaN  value3     NaN
    2     NaN     NaN  value3  value4
    

    If you also need to sort the columns, tack this on at the end:

    .reindex(columns=headers_mapping.values())
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search