skip to Main Content

I have dataframe df which has column called test_col which contains json structures as shown below. As you can see lineItemPromotions object has nested jsons in it which can have 0-10 numbers of items in it. By unnesting, it should create new rows for each ID under lineItemPromotions.
How do I unnest this structures correctly?

{'provider': 'ABC',
 'discountCodes_out': [],
 'discounts_out': [],
 'lineItemPromotions': [{'id': '1',
   'discountCodes': [],
   'discounts': [{'rule': 'Bundle Discount',
     'name': 'Bundle Discount',
     'ruleId': '',
     'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
     'centAmount': 1733550}],
   'perUnitPromotionsShares': [1733550]},
  {'id': '2',
   'discountCodes': [],
   'discounts': [{'rule': 'Bundle Discount',
     'name': 'Bundle Discount',
     'ruleId': '',
     'campaignId': '419f9a2f-0342-41c0-ac79-419d1023aaa9',
     'centAmount': 119438}],
   'perUnitPromotionsShares': [119438, 119438]}]}

I tried following code but it is not working correctly. It is giving me nested item which I have to unnest again. Sorry that I have to paste the picture to show you the process how it is giving results.

enter image description here

2

Answers


  1. You could first explode your columns then concat with a new df made from the discounts column:

    df = pd.json_normalize(new_dict, meta='provider', record_path='lineItemPromotions')
    df = df.apply(pd.Series.explode)
    pd.concat([df.drop(columns='discounts').reset_index(drop=True), 
               pd.DataFrame(df['discounts'].values.tolist())], axis=1)
    

    Output:

      id discountCodes perUnitPromotionsShares provider             rule             name ruleId                            campaignId  centAmount
    0  1           NaN                 1733550      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9     1733550
    1  2           NaN                  119438      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438
    2  2           NaN                  119438      ABC  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438
    
    Login or Signup to reply.
  2. Although verbose, you could explicitly normalize each level:

    pd.concat(
       [
          pd.json_normalize(data).explode("lineItemPromotions")
            .drop(columns="lineItemPromotions").reset_index(drop=True),
          pd.json_normalize(data, record_path=["lineItemPromotions"])
             .drop(columns="discounts"),
          pd.json_normalize(data, record_path=["lineItemPromotions", "discounts"])
       ], 
       axis=1
    )
    
      provider discountCodes_out discounts_out id discountCodes perUnitPromotionsShares             rule             name ruleId                            campaignId  centAmount
    0      ABC                []            []  1            []               [1733550]  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9     1733550
    1      ABC                []            []  2            []        [119438, 119438]  Bundle Discount  Bundle Discount         419f9a2f-0342-41c0-ac79-419d1023aaa9      119438
    

    You could then .explode("perUnitPromotionsShares") if desired.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search