skip to Main Content

I have a json file with the following contents:

{
    "id1": {
        "key": "value"
    },
    "id2": {
        "key": "value"
    }
}

I want to check that each top level key ie. id1, id2 is present only once in a file and if not produce an error. So something like

{
    "id1": {
        "key": "value"
    },
    "id1": {
        "key": "value"
    }
}

must show as error.

Is there a way to do this with an json parsers like jq or json-glib-validate?

I came up with a pythonic solution, that works, but would be nicer to have an actual parser.

This is supposed to be used in CI.

import collections
import json
import sys

def check_duplicates(pairs):
    count = collections.Counter(i for i,j in pairs)
    duplicates = ", ".join(i for i,j in count.items() if j>1)

    if len(duplicates) != 0:
        print("Duplicate keys found: {}".format(duplicates))
        sys.exit(1)

def validate(pairs):
    check_duplicates(pairs)
    return dict(pairs)

with open("file.json", "r") as file:
    try:
        obj = json.load(file, object_pairs_hook=validate)
    except ValueError as e:
        print("Invalid json: %s" % e)
        sys.exit(1)

2

Answers


  1. Is there a way to do this with an json parsers like jq

    You can use jq’s stream representation, which can be called either via the --stream flag or via the tostream function. The difference is that for the function, the input has already been parsed (and duplicate keys collapsed), whereas using the flag would start streaming while reading the input, thus before any collapsing could occur. So, being in bash, for example, just compare the output of both, e.g. using diff:

    $ diff -q <(jq --stream . nodupkeys.json) <(jq tostream nodupkeys.json)
    # no output
    
    $ diff -q <(jq --stream . withdupkeys.json) <(jq tostream withdupkeys.json)
    Files /dev/fd/63 and /dev/fd/62 differ
    

    Addressing @peak’s comment: The approach above would fail if using itchyny/gojq instead of jqlang/jq, as it automatically sorts the keys, but only after parsing the input (which is encountered when using the function, but not when using the flag), so for a document with unordered keys but without duplicates, it would still yield a difference.

    The following (shell-free) approach tries to mitigate this by instead collecting all paths provided by the flag variant into an array, and then comparing its sorted version with its uniqued version (which is also automatically sorted), rendering irrelevant a potential sorting performed by the employed processor’s implementation.

    $ jq --stream -n '[inputs[-2] | arrays] | sort == unique' nodupkeys.json
    true
    
    $ jq --stream -n '[inputs[-2] | arrays] | sort == unique' withdupkeys.json
    false
    
    Login or Signup to reply.
  2. It is not possible to have duplicate keys in a JSON by definition. So to make your life way easier in changing the json to something like this:

    {
        "value": {
            ...
        },
        "value": {
            ...
        }
    }
    

    There also should be parser for that.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search