I have a json file with the following contents:
{
"id1": {
"key": "value"
},
"id2": {
"key": "value"
}
}
I want to check that each top level key ie. id1, id2
is present only once in a file and if not produce an error. So something like
{
"id1": {
"key": "value"
},
"id1": {
"key": "value"
}
}
must show as error.
Is there a way to do this with an json parsers like jq
or json-glib-validate
?
I came up with a pythonic solution, that works, but would be nicer to have an actual parser.
This is supposed to be used in CI.
import collections
import json
import sys
def check_duplicates(pairs):
count = collections.Counter(i for i,j in pairs)
duplicates = ", ".join(i for i,j in count.items() if j>1)
if len(duplicates) != 0:
print("Duplicate keys found: {}".format(duplicates))
sys.exit(1)
def validate(pairs):
check_duplicates(pairs)
return dict(pairs)
with open("file.json", "r") as file:
try:
obj = json.load(file, object_pairs_hook=validate)
except ValueError as e:
print("Invalid json: %s" % e)
sys.exit(1)
2
Answers
You can use jq’s stream representation, which can be called either via the
--stream
flag or via thetostream
function. The difference is that for the function, the input has already been parsed (and duplicate keys collapsed), whereas using the flag would start streaming while reading the input, thus before any collapsing could occur. So, being in bash, for example, just compare the output of both, e.g. usingdiff
:Addressing @peak’s comment: The approach above would fail if using itchyny/gojq instead of jqlang/jq, as it automatically sorts the keys, but only after parsing the input (which is encountered when using the function, but not when using the flag), so for a document with unordered keys but without duplicates, it would still yield a difference.
The following (shell-free) approach tries to mitigate this by instead collecting all paths provided by the flag variant into an array, and then comparing its
sort
ed version with itsunique
d version (which is also automatically sorted), rendering irrelevant a potential sorting performed by the employed processor’s implementation.It is not possible to have duplicate keys in a JSON by definition. So to make your life way easier in changing the json to something like this:
There also should be parser for that.