skip to Main Content

Consider collection with following documents:

[
  {
     "_id": "3981396a-9fcb-4c24-976f-d500f20c4fab",
     "entries": [
        {
           "key": "var1"
           "value": "value1"
        },
        {
           "key": "var1"
           "value": "value11"
        }
        {
           "key": "var2"
           "value": "value2"
        }
     ]
  }
]

What would be the appropriate approach to de-duplicate entries for each document in collection. Query should at least find all of the documents with duplicated entries then manual looping over would be acceptable. Even better if it can be all done in single aggregation pipline.

Expected result is following:

[
  {
     "_id": "3981396a-9fcb-4c24-976f-d500f20c4fab",
     "entries": [
        {
           "key": "var1"
           "value": "value1"
        },
        {
           "key": "var2"
           "value": "value2"
        }
     ]
  }
]

2

Answers


  1. You can use $reduce to perform conditional insert into a placeholder array. Append the current element if the key is not already inside. Finally replace the entries array with the placeholder array.

    db.collection.update({},
    [
      {
        $set: {
          entries: {
            "$reduce": {
              "input": "$entries",
              "initialValue": [],
              "in": {
                "$cond": {
                  "if": {
                    "$in": [
                      "$$this.key",
                      "$$value.key"
                    ]
                  },
                  "then": "$$value",
                  "else": {
                    "$concatArrays": [
                      "$$value",
                      [
                        "$$this"
                      ]
                    ]
                  }
                }
              }
            }
          }
        }
      }
    ],
    {
      multi: true
    })
    

    Mongo Playground

    Login or Signup to reply.
  2. Query

    • you can also do it using stage operators with a bit tricky way, using a "local" unwind
    • lookup with collection with 1 empty document
    • this will allow you to use stage operators to manipulate the array members, like do a "local" unwind
    • unwind inside the lookup pipeline, group by the key and keep only 1 value

    *i don’t suggest its the best way in your case, but it can be useful this "local" unwind

    Playmongo

    col.aggregate(
    [{"$lookup": 
       {"from": "dummy_collection_with_1_empty_doc",
        "pipeline": 
         [{"$set": {"entries": "$$entries"}},
          {"$unwind": "$entries"},
          {"$group": 
             {"_id": "$entries.key", "value": {"$first": "$entries.value"}}},
          {"$project": {"_id": 0, "key": "$_id", "value": 1}}],
        "as": "entries",
        "let": {"entries": "$entries"}}}])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search