skip to Main Content

i want to remove the duplicates from each array in this json:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "one",
    "two",
    "two",
    "three",
    "three",
    "four",
    "four"
  ],
  "xyz": [
    "one",
    "one",
    "two",
    "two",
    "four"
  ]
}

output I am expecting after removing the duplicates:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "two",
    "three",
    "four"
  ],
  "xyz": [
    "one",
    "two",
    "four"
  ]
}

i tried map, uniq, group_by with jq but nothing helped

2

Answers


  1. unique can remove duplicates, but it automatically sorts the arrays, which may or may not be what you want.

    jq '.[] |= unique'
    
    {
      "abc": [
        "five"
      ],
      "pqr": [
        "four",
        "one",
        "three",
        "two"
      ],
      "xyz": [
        "four",
        "one",
        "two"
      ]
    }
    

    Demo

    You can retrieve the original ordering by recreating the array based on sort ing the index positions of all of its unique items:

    jq '.[] |= [.[[index(unique[])] | sort[]]]'
    

    Demo

    Or circumvent any sorting behaviour by writing your own straightforward de-duplication function:

    jq '.[] |= reduce .[] as $i ([]; . + if index($i) then [] else [$i] end)'
    

    Demo

    In my tests, the latter performed best, with both producing

    {
      "abc": [
        "five"
      ],
      "pqr": [
        "one",
        "two",
        "three",
        "four"
      ],
      "xyz": [
        "one",
        "two",
        "four"
      ]
    }
    
    Login or Signup to reply.
  2. Here is a sort-free alternative for obtaining the distinct items in an array (or stream) while retaining the order of first occurrence.

    It uses a filter that is a tiny bit more complex than it would otherwise be, for the sake of complete genericity:

    # generate a stream of the distinct items in `stream`
    # in order of first occurrence, without sorting
    def uniques(stream):
      foreach stream as $s ({};
         ($s|type) as $t
         | (if $t == "string" then $s else ($s|tostring) end) as $y
         | if .[$t][$y] then .emit = false else .emit = true | (.item = $s) | (.[$t][$y] = true) end;
         if .emit then .item else empty end );
    

    Now it’s just a matter of applying this filter to your JSON. One possibility would be:

     map_values([uniques(.[])])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search