skip to Main Content

I have recently shifted to MongoDB and Mongoose with Node.js. And I am wrapping my head around it all coming from SQL.

I have a collection where documents have a similar structure to the following:

{
    name: String
    rank: Number
}

Sometimes the name might be the same, but the rank will always be different.

I would like to remove all duplicates of name, but retain the object that has the LOWEST rank.

For instance, if my collection looked like this:

{
    name: "name1"
    rank: 3
},
{
    name: "name1"
    rank: 4
},
{
    name: "name1"
    rank: 2
}

I would like to remove all objects where name is the same except for:

{
    name: "name1"
    rank: 2
}

Is this possible to do with mongoose?

2

Answers


  1. Chosen as BEST ANSWER

    Okay, I figured it out using aggregate:

    const duplicates = await collectionName.aggregate([
      {
        $group: {
          _id: "$name",
          dups: { $addToSet: "$_id" },
          count: { $sum: 1 }
        }
      },
      {
        $match: {
          count: { $gt: 1 }
        }
      }
    ]);
    
    duplicates.forEach(async (item) => {
      const duplicate_names = item.dups;
      const duplicate_name = await collectionName.find({ _id: { $in: duplicate_names } }).sort({ rank: 1 });
    
      duplicate_name.shift();
    
      duplicate_name.forEach(async (item) => {
        await collectionName.deleteOne({ _id: item._id });
      });
    });
    

  2. Here is my approach:

    const found = await db.collection.aggregate([
      {
        $group: {
          _id: "$name",
          minRank: {
            $min: "$rank"
          }
        }
      },
    ])
    
    
    await db.collection.deleteMany({ 
      $or: found.map(item => ({
        name: item._id,
        rank: { $ne: item.minRank }
      }))
    })
    

    Explanation:

    From my point of view your solution would result in many unnecessary calls being made, which would result in a terrible time of execution. My solution exactly contains two steps:

    1. find for each document’s property name the corresponding lowest rank available.
    2. delete each document, where the name is equal to one of those names and the rank is not equal to the actual lowest rank found.

    Additional notes:

    If not already done, you should probably define an index on the name property of your schema for performance reasons.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search