skip to Main Content

Basically I have a dataframe that looks like this

       num    platform  region
id          
569094  10  AMZN    in
569094  5   AMZN    ca
569094  12  NFLX    us
385687  15  DIS in
385687  20  NFLX    ca

I need to convert this into json however its not as straight forward as creating a json records, since I need to use groupby

This is what I did so far

 main_df = main_df.groupby('id').agg(lambda x: x.tolist())
    result = main_df.to_json(orient='index')
    parsed = json.loads(result)
    print(parsed) 

which gives me this

{
   "385687":{
      "num":[
         15,
         20
      ],
      "platform":[
         "DIS",
         "NFLX"
      ],
      "region":[
         "in",
         "ca"
      ]
   },
   "569094":{
      "num":[
         10,
         5,
         12  
      ],
      "platform":[
         "AMZN",
         "AMZN",
         "NFLX"
      ],
      "region":[
         "in",
         "ca",
         "us"
      ]
   }
}

Expected output


"id":385687,
  "platforms":{
     "DIS":{
        "in":15      
     },
     "NFLX":{        
        "ca":20
     }

  "id":569094,
  "platforms":{
     "AMZN":{
        "in":10
        "ca":5
      
     },
     "NFLX":{        
        "us":12
     }

any help would be appreciated, I can’t find anyone asking similar question

2

Answers


  1. Its a little bit complex, but this should work:

    df = pd.read_clipboard() # copy data from OS question
    
    def make_json(x):
        data = {
                "id": x.index.min(),
                "platforms": {pl: {} for pl in x["platform"].unique()}
               }
        for pl, reg, num in zip(x["platform"], x["region"], x["num"]):
            data["platforms"][pl].update({reg: num})
        return data
    
    df.groupby('id').apply(make_json).tolist()
    
    [{'id': 385687, 'platforms': {'DIS': {'in': 15}, 'NFLX': {'ca': 20}}},
     {'id': 569094,
      'platforms': {'AMZN': {'in': 10, 'ca': 5}, 'NFLX': {'us': 12}}}]
    
    Login or Signup to reply.
  2. A pure pandas approach:

    out = (df
       # form the region dictionaries
       .groupby(['id', 'platform'])
       .apply(lambda d: d.set_index('region')['num'].to_dict())
       .groupby('id')
       # aggregate by platform
       .agg(lambda d: d.droplevel('id').to_dict())
       .reset_index(name='platforms').to_dict('records')
     )
    

    Output:

    [{'id': 385687,
      'platforms': {'DIS': {'in': 15},
                    'NFLX': {'ca': 20}}},
     {'id': 569094,
      'platforms': {'AMZN': {'in': 10, 'ca': 5},
                    'NFLX': {'us': 12}}},
    ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search