Merge json objects present under duplicate keys

sjko5
June 25, 2024
161 views
1 vote
2 Answers

I have a json file like this:

{
    "parent1": {
        "child1": "bob"
    },
    "parent1": {
        "child2": "tom"
    },
    "parent2": {
        "child1": "jon"
    }
}

I want to merge the values present under duplicate toplevel keys. So the expected output should be:

{
    "parent1": {
        "child1": "bob",
        "child2": "tom"
    },
    "parent2": {
        "child1": "jon"
    }
}

Currently I am using the following code to check duplicate keys:

import json

def check_duplicates(pairs):
    d = {}
    for key, val in pairs:
        if key in d:
            raise ValueError(f"Duplicate key(s) found: {key}")
        else:
            d[key] = val
    return d

filename = "test.json"
with open(filename, "r") as f:
    try:
        data = json.load(f, object_pairs_hook=check_duplicates)
    except ValueError as err:
        print(f"{filename}: Failed to decode: {err}")

Any idea how I can merge them?

Tags: json python

Answers

Modify the function to merge the child dictionaries when duplicate keys are found.
Use this function as the object_pairs_hook in json.load.

    import json
    from collections import defaultdict
    
    def merge_duplicates(pairs):
        d = defaultdict(dict)
        for key, val in pairs:
            if key in d:
                d[key].update(val)
            else:
                d[key] = val
        return dict(d)
    
    filename = "test.json"
    with open(filename, "r") as f:
        data = json.load(f, object_pairs_hook=merge_duplicates)
    
    print(json.dumps(data, indent=4))

As per valid JSON format, there should not be any duplicate key.

However, as we may have invalid JSON data, we sometimes need to clean it.

I modified the code from SO answer and the above answer to make the desired outcome of raising an error in an invalid child.

import json


def merge_duplicates(data):
    clean_data = {}
    for parent, child_dict in data:
        if parent in clean_data:
            child_key = next(iter(child_dict))
            if child_key in clean_data[parent]:
                raise ValueError(f"Duplicate child(s) ({child_key}) found on key: {parent}")
            else:
                clean_data[parent].update(child_dict)
        else:
            clean_data[parent] = child_dict
    return clean_data


filename = "valid.json"
with open(filename, "r") as f:
    data = json.load(f, object_pairs_hook=merge_duplicates)
print("valid.json", json.dumps(data, indent=4))

filename = "invalid.json"
with open(filename, "r") as f:
    data = json.load(f, object_pairs_hook=merge_duplicates)
print("invalid.json", json.dumps(data, indent=4))

valid.json:

{
    "parent1": {
        "child1": "bob"
    },
    "parent1": {
        "child2": "tom"
    },
    "parent1": {
        "child3": "eve"
    },
    "parent2": {
        "child1": "jon"
    }
}

invalid.json:

{
    "parent1": {
        "child1": "bob"
    },
    "parent1": {
        "child2": "tom"
    },
    "parent1": {
        "child1": "foo"
    },
    "parent2": {
        "child1": "jon"
    }
}

Output:

valid.json {
    "parent1": {
        "child1": "bob",
        "child2": "tom",
        "child3": "eve"
    },
    "parent2": {
        "child1": "jon"
    }
}
Traceback (most recent call last):
  File "/media/shovon/Codes/so/main.py", line 25, in <module>
    data = json.load(f, object_pairs_hook=merge_duplicates)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 293, in load
    return loads(fp.read(),
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/media/shovon/Codes/so/main.py", line 10, in merge_duplicates
    raise ValueError(f"Duplicate child(s) ({child_key}) found on key: {parent}")
ValueError: Duplicate child(s) (child1) found on key: parent1

Explanation:

Duplicate parent key handling: The merge_duplicates function uses the object_pairs_hook parameter to process each key-value pair in the JSON data as it is loaded. This ensures that the function can detect and handle duplicate parent keys.
Child key conflict detection: For each parent key, the function checks if the parent key already exists in the clean_data dictionary using next(iter(child_dict)). If it does, it checks if the new child key (within the duplicate parent key) already exists. If a duplicate child key is detected, a ValueError is raised, indicating the conflict. For example, repeating values for child1 for parent1 raises ValueError for invalid.json file.

Please signup or login to give your own answer.

Click here to cancel reply.