skip to Main Content

I have a json file like this:

{
    "parent1": {
        "child1": "bob"
    },
    "parent1": {
        "child2": "tom"
    },
    "parent2": {
        "child1": "jon"
    }
}

I want to merge the values present under duplicate toplevel keys. So the expected output should be:

{
    "parent1": {
        "child1": "bob",
        "child2": "tom"
    },
    "parent2": {
        "child1": "jon"
    }
}

Currently I am using the following code to check duplicate keys:

import json

def check_duplicates(pairs):
    d = {}
    for key, val in pairs:
        if key in d:
            raise ValueError(f"Duplicate key(s) found: {key}")
        else:
            d[key] = val
    return d

filename = "test.json"
with open(filename, "r") as f:
    try:
        data = json.load(f, object_pairs_hook=check_duplicates)
    except ValueError as err:
        print(f"{filename}: Failed to decode: {err}")

Any idea how I can merge them?

2

Answers


    1. Modify the function to merge the child dictionaries when duplicate keys are found.
    2. Use this function as the object_pairs_hook in json.load.
        import json
        from collections import defaultdict
        
        def merge_duplicates(pairs):
            d = defaultdict(dict)
            for key, val in pairs:
                if key in d:
                    d[key].update(val)
                else:
                    d[key] = val
            return dict(d)
        
        filename = "test.json"
        with open(filename, "r") as f:
            data = json.load(f, object_pairs_hook=merge_duplicates)
        
        print(json.dumps(data, indent=4))
    
    Login or Signup to reply.
  1. As per valid JSON format, there should not be any duplicate key.

    However, as we may have invalid JSON data, we sometimes need to clean it.

    I modified the code from SO answer and the above answer to make the desired outcome of raising an error in an invalid child.

    import json
    
    
    def merge_duplicates(data):
        clean_data = {}
        for parent, child_dict in data:
            if parent in clean_data:
                child_key = next(iter(child_dict))
                if child_key in clean_data[parent]:
                    raise ValueError(f"Duplicate child(s) ({child_key}) found on key: {parent}")
                else:
                    clean_data[parent].update(child_dict)
            else:
                clean_data[parent] = child_dict
        return clean_data
    
    
    filename = "valid.json"
    with open(filename, "r") as f:
        data = json.load(f, object_pairs_hook=merge_duplicates)
    print("valid.json", json.dumps(data, indent=4))
    
    filename = "invalid.json"
    with open(filename, "r") as f:
        data = json.load(f, object_pairs_hook=merge_duplicates)
    print("invalid.json", json.dumps(data, indent=4))
    

    valid.json:

    {
        "parent1": {
            "child1": "bob"
        },
        "parent1": {
            "child2": "tom"
        },
        "parent1": {
            "child3": "eve"
        },
        "parent2": {
            "child1": "jon"
        }
    }
    

    invalid.json:

    {
        "parent1": {
            "child1": "bob"
        },
        "parent1": {
            "child2": "tom"
        },
        "parent1": {
            "child1": "foo"
        },
        "parent2": {
            "child1": "jon"
        }
    }
    

    Output:

    valid.json {
        "parent1": {
            "child1": "bob",
            "child2": "tom",
            "child3": "eve"
        },
        "parent2": {
            "child1": "jon"
        }
    }
    Traceback (most recent call last):
      File "/media/shovon/Codes/so/main.py", line 25, in <module>
        data = json.load(f, object_pairs_hook=merge_duplicates)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/json/__init__.py", line 293, in load
        return loads(fp.read(),
               ^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/json/__init__.py", line 359, in loads
        return cls(**kw).decode(s)
               ^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
        obj, end = self.scan_once(s, idx)
                   ^^^^^^^^^^^^^^^^^^^^^^
      File "/media/shovon/Codes/so/main.py", line 10, in merge_duplicates
        raise ValueError(f"Duplicate child(s) ({child_key}) found on key: {parent}")
    ValueError: Duplicate child(s) (child1) found on key: parent1
    

    Explanation:

    • Duplicate parent key handling: The merge_duplicates function uses the object_pairs_hook parameter to process each key-value pair in the JSON data as it is loaded. This ensures that the function can detect and handle duplicate parent keys.

    • Child key conflict detection: For each parent key, the function checks if the parent key already exists in the clean_data dictionary using next(iter(child_dict)). If it does, it checks if the new child key (within the duplicate parent key) already exists. If a duplicate child key is detected, a ValueError is raised, indicating the conflict. For example, repeating values for child1 for parent1 raises ValueError for invalid.json file.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search