skip to Main Content

i have a collection with more then 1000 documents and there are some documents with same value in some fields, i need to get those

the collection is:

[{_id,fields1,fields2,fields3,etc...}]

what query can i use to get all the elements that have the same 3 fields for example:

[
 {_id:1,fields1:'a',fields2:1,fields3:'z'},
 {_id:2,fields1:'a',fields2:1,fields3:'z'},
 {_id:3,fields1:'f',fields2:2,fields3:'g'},
 {_id:4,fields1:'f',fields2:2,fields3:'g'},
 {_id:5,fields1:'j',fields2:3,fields3:'g'},
]

i need to get

[
 {_id:2,fields1:'a',fields2:1,fields3:'z'},
 {_id:4,fields1:'f',fields2:2,fields3:'g'},
]

in this way i can easly get a list of "duplicate" that i can delete if needed, it’s not really important get id 2 and 4 or 1 and 3

but 5 would never be included as it’s not ‘duplicated’

EDIT:
sorry but i forgot to mention that there are some document with null value i need to exclude those

2

Answers


  1. I think you can try this aggregation query:

    • First group by the feilds you want to know if there are multiple values.
    • It creates an array with the _ids that are repeated.
    • Then get only where there is more than one ($match).
    • And last project to get the desired output. I’ve used the first _id found.
    db.collection.aggregate([
      {
        "$group": {
          "_id": {
            "fields1": "$fields1",
            "fields2": "$fields2",
            "fields3": "$fields3"
          },
          "duplicatesIds": {
            "$push": "$_id"
          }
        }
      },
      {
        "$match": {
          "$expr": {
            "$gt": [
              {
                "$size": "$duplicatesIds"
              },
              1
            ]
          }
        }
      },
      {
        "$project": {
          "_id": {
            "$arrayElemAt": [
              "$duplicatesIds",
              0
            ]
          },
          "fields1": "$_id.fields1",
          "fields2": "$_id.fields3",
          "fields3": "$_id.fields2"
        }
      }
    ])
    

    Example here

    Login or Signup to reply.
  2. This is the perfect use case of window field. You can use $setWindowFields to compute $rank in the grouping/partition you want. Then, get those rank not equal to 1 to get the duplicates.

    db.collection.aggregate([
      {
        $match: {
          fields1: {
            $ne: null
          },
          fields2: {
            $ne: null
          },
          fields3: {
            $ne: null
          }
        }
      },
      {
        "$setWindowFields": {
          "partitionBy": {
            fields1: "$fields1",
            fields2: "$fields2",
            fields3: "$fields3"
          },
          "sortBy": {
            "_id": 1
          },
          "output": {
            "duplicateRank": {
              "$rank": {}
            }
          }
        }
      },
      {
        $match: {
          duplicateRank: {
            $ne: 1
          }
        }
      },
      {
        $unset: "duplicateRank"
      }
    ])
    

    Mongo Playground

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search