I have a list of JSONs that I need to groupby the ‘day’ field and format. Here is an example of the the data and what the final output should look like.
data = [{'info': {'area': 'USA', 'other': 'cat'}, 'day': '1-1-2012', 'num': 12},
{'info': {'area': 'KSA', 'other': 'bat'}, 'day': '1-1-2012', 'num': 52},
{'info': {'area': 'KSA', 'other': 'fat'}, 'day': '4-3-2012', 'num': 34},]
The desired output should be:
[{'1-1-2012': {'area' : {'USA', 'KSA'}, 'num': {12, 52}}, '4-3-2012': {'area': {'KSA'}, 'num': {34}}}]
I tried using pd.json_normmalize() to make the entire list into a dataframe first but I believe there is an easier way to achieve the above output.
Thanks!
2
Answers
Assuming you start with a dataframe created from
data
, you can extract the area values,groupby
the day and then convert back to JSON:Output for your sample data:
Note that if your actual desired output is a dictionary with sets for values, you can change the aggregation to
set
and replace the call toto_json
withto_dict
:Output:
I think your desired data structure is probably wrong in some fundamental way, but assuming for the sake of argument that this lossy transformation is what you want, you could do:
which gives: