Check for duplicate keys in a json file

user22990699
February 19, 2024
213 views
1 vote
2 Answers

I have a json file with the following contents:

{
    "id1": {
        "key": "value"
    },
    "id2": {
        "key": "value"
    }
}

I want to check that each top level key ie. id1, id2 is present only once in a file and if not produce an error. So something like

{
    "id1": {
        "key": "value"
    },
    "id1": {
        "key": "value"
    }
}

must show as error.

Is there a way to do this with an json parsers like jq or json-glib-validate?

I came up with a pythonic solution, that works, but would be nicer to have an actual parser.

This is supposed to be used in CI.

import collections
import json
import sys

def check_duplicates(pairs):
    count = collections.Counter(i for i,j in pairs)
    duplicates = ", ".join(i for i,j in count.items() if j>1)

    if len(duplicates) != 0:
        print("Duplicate keys found: {}".format(duplicates))
        sys.exit(1)

def validate(pairs):
    check_duplicates(pairs)
    return dict(pairs)

with open("file.json", "r") as file:
    try:
        obj = json.load(file, object_pairs_hook=validate)
    except ValueError as e:
        print("Invalid json: %s" % e)
        sys.exit(1)

Tags: continuous-integration jq json python

Answers

- pmf
- February 19, 2024 at 3:24 pm
- 0 votes
0
Is there a way to do this with an json parsers like jq

You can use jq’s stream representation, which can be called either via the --stream flag or via the tostream function. The difference is that for the function, the input has already been parsed (and duplicate keys collapsed), whereas using the flag would start streaming while reading the input, thus before any collapsing could occur. So, being in bash, for example, just compare the output of both, e.g. using diff:
```
$ diff -q <(jq --stream . nodupkeys.json) <(jq tostream nodupkeys.json)
# no output
```
```
$ diff -q <(jq --stream . withdupkeys.json) <(jq tostream withdupkeys.json)
Files /dev/fd/63 and /dev/fd/62 differ
```
Addressing @peak’s comment: The approach above would fail if using itchyny/gojq instead of jqlang/jq, as it automatically sorts the keys, but only after parsing the input (which is encountered when using the function, but not when using the flag), so for a document with unordered keys but without duplicates, it would still yield a difference.

The following (shell-free) approach tries to mitigate this by instead collecting all paths provided by the flag variant into an array, and then comparing its sorted version with its uniqued version (which is also automatically sorted), rendering irrelevant a potential sorting performed by the employed processor’s implementation.
```
$ jq --stream -n '[inputs[-2] | arrays] | sort == unique' nodupkeys.json
true
```
```
$ jq --stream -n '[inputs[-2] | arrays] | sort == unique' withdupkeys.json
false
```
Login or Signup to reply.

- mufsalup
- February 19, 2024 at 3:25 pm
- 0 votes
0
It is not possible to have duplicate keys in a JSON by definition. So to make your life way easier in changing the json to something like this:
```
{
    "value": {
        ...
    },
    "value": {
        ...
    }
}
```
There also should be parser for that.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.