skip to Main Content

I have two JSON objects. I want to merge them but wherever the keys are the same the field obj_count should be summed. Is there any way around it in python?

Here is an example of it:
This is the 1st JSON object

[
    {"text": " pen and ink and watercolour", "id": "x32505 ", "obj_count": 1855},
    {"text": " watercolour", "id": "x33202 ", "obj_count": 674},
    {"text": "pencil", "id": "AAT16013 ", "obj_count": 297}
]

And here is the second json object

[
    {"text": " pen and ink and watercolour", "id": "x32505 ", "obj_count": 807},
    {"text": " watercolour", "id": "x33202 ", "obj_count": 97},
    {"text": " ink", "id": "AAT15012 ", "obj_count": 297}
]

What I want is something like this:

[
   {"text":" pen and ink and watercolour","id":"x32505 ","obj_count": 2662 #summed},
   {"text":" watercolour","id":"x33202 ","obj_count": 771 #summed},
   {"text":" ink","id":"AAT15012 ","obj_count":297},
   {"text":"pencil","id":"AAT16013 ","obj_count":297}
]

2

Answers


  1. Yes

    Any loading/saving can be done with the json module (not used below though)

    def sum_list_of_dict(source, add):
        for add_elem in add:
            found = False
            for source_elem in source:
                if add_elem["id"] == source_elem["id"]:
                    source_elem["obj_count"] += add_elem["obj_count"]
                    found = True
                    break  # dupes should not be present
            if not found:
                source.append(add_elem)
        return source
    
    
    data1 = [
        {"text": "pen and ink and watercolour", "id": "x32505", "obj_count": 1855},
        {"text": "watercolour", "id": "x33202", "obj_count": 674},
        {"text": "pencil", "id": "AAT16013", "obj_count": 297},
    ]
    
    data2 = [
        {"text": "pen and ink and watercolour", "id": "x32505", "obj_count": 807},
        {"text": "watercolour", "id": "x33202", "obj_count": 97},
        {"text": "ink", "id": "AAT15012", "obj_count": 297},
    ]
    
    data3 = sum_list_of_dict(data1, data2)
    
    # just for pretty printing
    from pprint import pprint
    pprint(data3)
    

    output

    [{'id': 'x32505', 'obj_count': 2662, 'text': 'pen and ink and watercolour'},
     {'id': 'x33202', 'obj_count': 771, 'text': 'watercolour'},
     {'id': 'AAT16013', 'obj_count': 297, 'text': 'pencil'},
     {'id': 'AAT15012', 'obj_count': 297, 'text': 'ink'}]
    
    Login or Signup to reply.
  2. Use a dict to store whether you have seen an id or not

    • if you have, sum their obj_count
    • if you haven’t, just save the item
    values_a = [
        {"text": " pen and ink and watercolour", "id": "x32505 ", "obj_count": 1855},
        {"text": " watercolour", "id": "x33202 ", "obj_count": 674},
        {"text": "pencil", "id": "AAT16013 ", "obj_count": 297}
    ]
    
    values_b = [
        {"text": " pen and ink and watercolour", "id": "x32505 ", "obj_count": 807},
        {"text": " watercolour", "id": "x33202 ", "obj_count": 97},
        {"text": " ink", "id": "AAT15012 ", "obj_count": 297}
    ]
    
    result = {}
    for item in [*values_a, *values_b]:
        if item['id'] in result:
            result[item['id']]['obj_count'] += item['obj_count']
        else:
            result[item['id']] = item
    
    # back to list of items
    result = list(result.values())
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search