I have a json file like this:
{
"parent1": {
"child1": "bob"
},
"parent1": {
"child2": "tom"
},
"parent2": {
"child1": "jon"
}
}
I want to merge the values present under duplicate toplevel keys. So the expected output should be:
{
"parent1": {
"child1": "bob",
"child2": "tom"
},
"parent2": {
"child1": "jon"
}
}
Currently I am using the following code to check duplicate keys:
import json
def check_duplicates(pairs):
d = {}
for key, val in pairs:
if key in d:
raise ValueError(f"Duplicate key(s) found: {key}")
else:
d[key] = val
return d
filename = "test.json"
with open(filename, "r") as f:
try:
data = json.load(f, object_pairs_hook=check_duplicates)
except ValueError as err:
print(f"{filename}: Failed to decode: {err}")
Any idea how I can merge them?
2
Answers
object_pairs_hook
injson.load
.As per valid JSON format, there should not be any duplicate key.
However, as we may have invalid JSON data, we sometimes need to clean it.
I modified the code from SO answer and the above answer to make the desired outcome of raising an error in an invalid child.
valid.json
:invalid.json
:Output:
Explanation:
Duplicate parent key handling: The
merge_duplicates
function uses theobject_pairs_hook
parameter to process each key-value pair in the JSON data as it is loaded. This ensures that the function can detect and handle duplicate parent keys.Child key conflict detection: For each parent key, the function checks if the parent key already exists in the
clean_data
dictionary usingnext(iter(child_dict))
. If it does, it checks if the new child key (within the duplicate parent key) already exists. If a duplicate child key is detected, aValueError
is raised, indicating the conflict. For example, repeating values forchild1
forparent1
raisesValueError
forinvalid.json
file.