Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

How Do I unnest following structure of json into interpretable table using python?

Dan
February 21, 2023
201 views
0 votes
2 Answers

I have dataframe df which has column called test_col which contains json structures as shown below. As you can see lineItemPromotions object has nested jsons in it which can have 0-10 numbers of items in it. By unnesting, it should create new rows for each ID under lineItemPromotions.
How do I unnest this structures correctly?

{'provider': 'ABC',
 'discountCodes_out': [],
 'discounts_out': [],
 'lineItemPromotions': [{'id': '1',
   'discountCodes': [],
   'discounts': [{'rule': 'Bundle Discount',
     'name': 'Bundle Discount',
     'ruleId': '',
     'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
     'centAmount': 1733550}],
   'perUnitPromotionsShares': [1733550]},
  {'id': '2',
   'discountCodes': [],
   'discounts': [{'rule': 'Bundle Discount',
     'name': 'Bundle Discount',
     'ruleId': '',
     'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
     'centAmount': 119438}],
   'perUnitPromotionsShares': [119438, 119438]}]}

I tried following code but it is not working correctly. It is giving me nested item which I have to unnest again. Sorry that I have to paste the picture to show you the process how it is giving results.

Answers

You could first explode your columns then concat with a new df made from the discounts column:

df = pd.json_normalize(new_dict, meta='provider', record_path='lineItemPromotions')
df = df.apply(pd.Series.explode)
pd.concat([df.drop(columns='discounts').reset_index(drop=True), 
           pd.DataFrame(df['discounts'].values.tolist())], axis=1)

Output:

  id discountCodes perUnitPromotionsShares provider             rule             name ruleId                            campaignId  centAmount
0  1           NaN                 1733550      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9     1733550
1  2           NaN                  119438      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438
2  2           NaN                  119438      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438

Although verbose, you could explicitly normalize each level:

pd.concat(
   [
      pd.json_normalize(data).explode("lineItemPromotions")
        .drop(columns="lineItemPromotions").reset_index(drop=True),
      pd.json_normalize(data, record_path=["lineItemPromotions"])
         .drop(columns="discounts"),
      pd.json_normalize(data, record_path=["lineItemPromotions", "discounts"])
   ], 
   axis=1
)

  provider discountCodes_out discounts_out id discountCodes perUnitPromotionsShares             rule             name ruleId                            campaignId  centAmount
0      ABC                []            []  1            []               [1733550]  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9     1733550
1      ABC                []            []  2            []        [119438, 119438]  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438

You could then .explode("perUnitPromotionsShares") if desired.

Please signup or login to give your own answer.

Click here to cancel reply.