skip to Main Content

This is the input json. In this example json the key/value… "foo:bar" keeps repeating randomly. Order is not important eventhough it looks to be repeating alternately.

[
  {
    "foo": "bar",
    "id": "baz"
  },
  {
    "thud": "grunt",
    "id": "fum"
  },
  {
    "foo": "bar",
    "id": "noot"
  },
  {
    "zot": "toto",
    "id": "pluto"
  },
  {
    "foo": "bar",
    "id": "toto"
  }  
]

Whenever a key/value gets repeated, rather than removing it, would want to add an additional key/value into that particular element as shown below
The desired output would be:

[
  {
    "foo": "bar",
    "id": "baz"
  },
  {
    "thud": "grunt",
    "id": "fum"
  },
  {
    "foo": "bar",
    "id": "noot",
    "desc": "1st duplicate found
  },
  {
    "zot": "toto",
    "id": "pluto"
  },
  {
    "foo": "bar",
    "id": "toto",
    "desc": "2nd duplicate found"
  } 
]

Again order and numbering is not relevant/required. Added it for articulation purposes only

Found several solution to remove duplicates but unable to make any headway to resolve this

Appreciate any proposed resolution for above

Thanks much for you time

Tried complex solution to split the json into two and merge with -n and argjson without much break through

2

Answers


  1. Here’s one approach using tostream and fromstream to deconstruct and reconstruct the input via stream representation, which is a stream of arrays containing a path and its corresponding value. A foreach loop iterates over this streams, replicating each item for later reconstruction. Additionally, it keeps track of each path-value pair reduced by the path’s first item (matches occur irrelevant of their position in the original input array), and registers each appearance using a counter. If that is higher than one, also output another item (distinguished by adding _dup to the last path item) with the current count as value.

    fromstream(
      foreach (tostream | [., (.[0] |= .[1:] | @json)]) as [$s,$j] (
        {};
        if $s | has(1) then .[$j] += 1 end;
        if .[$j] > 1 then [($s[0] | last += "_dup"), .[$j]] else empty end,
        $s
      )
    )
    
    [
      {
        "foo": "bar",
        "id": "baz"
      },
      {
        "thud": "grunt",
        "id": "fum"
      },
      {
        "foo_dup": 2,
        "foo": "bar",
        "id": "noot"
      },
      {
        "zot": "toto",
        "id": "pluto"
      },
      {
        "foo_dup": 3,
        "foo": "bar",
        "id": "toto"
      }
    ]
    

    Demo

    Login or Signup to reply.
  2. Here’s an easy-to-understand solution, or at least it’s a solution to one interpretation of the problem. It has the advantage that it can easily be modified in accordance with other interpretations of the problem.

    reduce .[] as $kv ({ ans: [], count: {}};
      ($kv|del(.id)|tostring) as $pair
      | .count[$pair] += 1
      | .count[$pair] as $count
      | if $count == 1 then .ans += [$kv]
        else .ans += [$kv + {desc: "duplicate #($count) of ($pair)"} ]
        end)
      | .ans
          
    

    Sample output:

    [
      {
        "foo": "bar",
        "id": "baz"
      },
      {
        "thud": "grunt",
        "id": "fum"
      },
      {
        "foo": "bar",
        "id": "noot",
        "desc": "duplicate #2 of {"foo":"bar"}"
      },
      {
        "zot": "toto",
        "id": "pluto"
      },
      {
        "foo": "bar",
        "id": "toto",
        "desc": "duplicate #3 of {"foo":"bar"}"
      }
    ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search