Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Aggregating JSON items in a list in Python

g246zdeyldz
December 13, 2022
168 views
1 vote
2 Answers

I would appreciate your help to aggregate dataset. I have a dataset something link below, and I want to aggregate ids if the other values are the same.

Current Data:

dataset= [
 {'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_id' : 'instance A', 'instance_name' : 'name A'},
 {'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_id' : 'instance B', 'instance_name' : 'name B'},
 {'title' : 'ABC', 'Description' : 'ABC Description', 'instance_id' : 'instance B', 'instance_name' : 'name B'},
 {'title' : 'ABC', 'Description' : 'ABC Description', 'instance_id' : 'instance C', 'instance_name' : 'name C'}
]

I want to get instance details for each title and description. Desired result should be as following:

dataset= [
{'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_info': {'instance_id' :['instance A', 'instance B'],'instance_name' : ['name A', 'name B']}},
{'title' : 'ABC', 'Description' : 'ABC Description', 'instance_info': {'instance_id' :['instance B', 'instance C'],'instance_name' : ['name B', 'name C']}}
]

I hope I could explain myself.

I tried to do it with aggregate item module but could not manage to work.

Answers

You can do it using with this way;

dataset = [
    {'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': 'instance A', 'instance_name': 'name A'},
    {'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': 'instance B', 'instance_name': 'name B'},
    {'title': 'ABC', 'Description': 'ABC Description', 'instance_id': 'instance B', 'instance_name': 'name B'},
    {'title': 'ABC', 'Description': 'ABC Description', 'instance_id': 'instance C', 'instance_name': 'name C'}
]

new_dataset = {}
for row in dataset:
    if row['title'] not in new_dataset.keys():
        new_dataset[row['title']] = {
            'title': row['title'],
            'Description': row['Description'],
            'instance_id': [row['instance_id']],
            'instance_name': [row['instance_name']]
        }
    else:
        new_dataset[row['title']]['instance_id'].append(row['instance_id'])
        new_dataset[row['title']]['instance_name'].append(row['instance_name'])

print(list(new_dataset.values()))

output:

# output: 
# [{'Description': 'XYZ Description',
#   'instance_id': ['instance A', 'instance B'],
#   'instance_name': ['name A', 'name B'],
#   'title': 'XYZ'},
#  {'Description': 'ABC Description',
#   'instance_id': ['instance B', 'instance C'],
#   'instance_name': ['name B', 'name C'],
#   'title': 'ABC'}]

This is a great use case for pandas:

import pandas as pd
pd.DataFrame(dataset).groupby(['title', 'Description'], as_index=False).agg(list).to_dict(orient="records")

Output:

[{'title': 'ABC', 'Description': 'ABC Description', 'instance_id': ['instance B', 'instance C'], 'instance_name': ['name B', 'name C']},
 {'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': ['instance A', 'instance B'], 'instance_name': ['name A', 'name B']}]

Please signup or login to give your own answer.

Click here to cancel reply.