skip to Main Content

I would appreciate your help to aggregate dataset. I have a dataset something link below, and I want to aggregate ids if the other values are the same.

Current Data:

dataset= [
 {'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_id' : 'instance A', 'instance_name' : 'name A'},
 {'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_id' : 'instance B', 'instance_name' : 'name B'},
 {'title' : 'ABC', 'Description' : 'ABC Description', 'instance_id' : 'instance B', 'instance_name' : 'name B'},
 {'title' : 'ABC', 'Description' : 'ABC Description', 'instance_id' : 'instance C', 'instance_name' : 'name C'}
]

I want to get instance details for each title and description. Desired result should be as following:

dataset= [
{'title' : 'XYZ', 'Description' : 'XYZ Description', 'instance_info': {'instance_id' :['instance A', 'instance B'],'instance_name' : ['name A', 'name B']}},
{'title' : 'ABC', 'Description' : 'ABC Description', 'instance_info': {'instance_id' :['instance B', 'instance C'],'instance_name' : ['name B', 'name C']}}
]

I hope I could explain myself.

I tried to do it with aggregate item module but could not manage to work.

2

Answers


  1. You can do it using with this way;

    dataset = [
        {'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': 'instance A', 'instance_name': 'name A'},
        {'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': 'instance B', 'instance_name': 'name B'},
        {'title': 'ABC', 'Description': 'ABC Description', 'instance_id': 'instance B', 'instance_name': 'name B'},
        {'title': 'ABC', 'Description': 'ABC Description', 'instance_id': 'instance C', 'instance_name': 'name C'}
    ]
    
    new_dataset = {}
    for row in dataset:
        if row['title'] not in new_dataset.keys():
            new_dataset[row['title']] = {
                'title': row['title'],
                'Description': row['Description'],
                'instance_id': [row['instance_id']],
                'instance_name': [row['instance_name']]
            }
        else:
            new_dataset[row['title']]['instance_id'].append(row['instance_id'])
            new_dataset[row['title']]['instance_name'].append(row['instance_name'])
    
    print(list(new_dataset.values()))
    

    output:

    # output: 
    # [{'Description': 'XYZ Description',
    #   'instance_id': ['instance A', 'instance B'],
    #   'instance_name': ['name A', 'name B'],
    #   'title': 'XYZ'},
    #  {'Description': 'ABC Description',
    #   'instance_id': ['instance B', 'instance C'],
    #   'instance_name': ['name B', 'name C'],
    #   'title': 'ABC'}]
    
    Login or Signup to reply.
  2. This is a great use case for pandas:

    import pandas as pd
    pd.DataFrame(dataset).groupby(['title', 'Description'], as_index=False).agg(list).to_dict(orient="records")
    

    Output:

    [{'title': 'ABC', 'Description': 'ABC Description', 'instance_id': ['instance B', 'instance C'], 'instance_name': ['name B', 'name C']},
     {'title': 'XYZ', 'Description': 'XYZ Description', 'instance_id': ['instance A', 'instance B'], 'instance_name': ['name A', 'name B']}]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search