skip to Main Content

I have a list of JSONs that I need to groupby the ‘day’ field and format. Here is an example of the the data and what the final output should look like.

data = [{'info': {'area': 'USA', 'other': 'cat'}, 'day': '1-1-2012', 'num': 12},
    {'info': {'area': 'KSA', 'other': 'bat'}, 'day': '1-1-2012', 'num': 52},
    {'info': {'area': 'KSA', 'other': 'fat'}, 'day': '4-3-2012', 'num': 34},]

The desired output should be:

[{'1-1-2012': {'area' : {'USA', 'KSA'}, 'num': {12, 52}}, '4-3-2012': {'area': {'KSA'}, 'num': {34}}}]

I tried using pd.json_normmalize() to make the entire list into a dataframe first but I believe there is an easier way to achieve the above output.

Thanks!

2

Answers


  1. Assuming you start with a dataframe created from data, you can extract the area values, groupby the day and then convert back to JSON:

    df = pd.DataFrame(data)
    out = (df
          .assign(area=df['info'].apply(lambda d:d['area']))
          .drop('info',axis=1)
          .groupby('day')
          .agg(list)
          .to_json(orient='index')
          )
    

    Output for your sample data:

    '{"1-1-2012":{"num":[12,52],"area":["USA","KSA"]},"4-3-2012":{"num":[34],"area":["KSA"]}}'
    

    Note that if your actual desired output is a dictionary with sets for values, you can change the aggregation to set and replace the call to to_json with to_dict:

    out = (df
          .assign(area=df['info'].apply(lambda d:d['area']))
          .drop('info',axis=1)
          .groupby('day')
          .agg(set)
          .to_dict(orient='index')
          )
    

    Output:

    {
      '1-1-2012': {
        'num': {12, 52},
        'area': {'USA', 'KSA'}
      },
      '4-3-2012': {
        'num': {34},
        'area': {'KSA'}
      }
    }
    
    Login or Signup to reply.
  2. I think your desired data structure is probably wrong in some fundamental way, but assuming for the sake of argument that this lossy transformation is what you want, you could do:

    
    data = [{'info': {'area': 'USA', 'other': 'cat'}, 'day': '1-1-2012', 'num': 12},
        {'info': {'area': 'KSA', 'other': 'bat'}, 'day': '1-1-2012', 'num': 52},
        {'info': {'area': 'KSA', 'other': 'fat'}, 'day': '4-3-2012', 'num': 34},]
    
    result = {}
    
    for d in data:
        day = d['day']
        row = result.setdefault(day, {'area': set(), 'num': set()})
        row['area'].add(d['info']['area'])
        row['num'].add(d['num'])
    

    which gives:

    >>> result
    {'1-1-2012': {'area': {'USA', 'KSA'}, 'num': {12, 52}}, '4-3-2012': {'area': {'KSA'}, 'num': {34}}}
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search