skip to Main Content

How do you use jq to detect and report duplicate object keys? For example, the following JSON has duplicate key on .a.

{
  "a":{
    "b": 1
  },
  "a":{
    "c": 1
  }
}

I think using –stream is required but I can’t quite get it right.

Edit: cannot assume the dup can exist only as the top level key. The dup may exist at any level.

2

Answers


  1. If you don’t mind running jq twice, you could produce the stream once "externally" (before collapsing duplicates) with the --stream flag, and once "internally" (after collapsing duplicates) with the tostream filter, and then diff their results (using jq -c further reduces the amount of output to diff):

    diff -qs <(jq -c --stream . file.json) <(jq -c tostream file.json)
    
    Login or Signup to reply.
  2. Here’s an efficient solution for the case where there is exactly one top-level object. It’s efficient for several reasons, including that it’s a jq-only solution (e.g., no need for diff), and that jq is only called once.

    jq -n --stream '
    
    # "bag of words"
    def bow(stream): 
      reduce stream as $word ({}; .[($word|tostring)] += 1);
    
    # Emit a stream of the duplicated items in the array
    def duplicates:
       bow(.[]) | with_entries(select(.value > 1)) | keys_unsorted[];
    
    # Duplicate keys of the top-level object
    [inputs
     | select( (length == 1 and (.[0]|length==2)) or
               (length == 2 and (.[0]|length==1)) )
     | first|first ]
    | duplicates
    '
    

    Output for the sample input:

    "a"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search