skip to Main Content

I have the following two collections, Player and Team. These two collections each contains lots of documents. Here is an example with some dummy data:

Player

_id: ObjectId('6098c5ab615d9e23543d0f6e'),
name: 'Frank',
age: 27,
position: 'striker'

_id: ObjectId('61409537c4f113354782eea3'),
name: 'Jan',
age: 19,
position: 'midfielder'

Team

_id: ObjectId('61961ce8981cfc332fd71711'),
name: 'Team A',
players: [
    {
        refId: ObjectId('6098c5ab615d9e23543d0f6e'),
        goals: 13
    },
    {
        refId: ObjectId('61409537c4f113354782eea3'),
        goals: 6
    },
    {
        refId: ObjectId('61fcfb7d0fcb4d4c284bdefa'),
        goals: 2
    }
]

_id: ObjectId('61fd27c152d4a56752b8f05c'),
name: 'Team B',
players: [
    {
        refId: ObjectId('6218a8af0486c46703414aba'),
        goals: 2
    },
    {
        refId: ObjectId('61409537c4f113354782eea3'),
        goals: 1
    }
]

_id: ObjectId('56fd77c152d4a56752b8y89h'),
name: 'Team C',
players: [
    {
        refId: ObjectId('6413780af746f734b8720e5e'),
        goals: 9
    },
    {
        refId: ObjectId('6413781fab3a6458779aaa3f'),
        goals: 0
    }
]

I’m trying to make a clean up script that goes through all docs in my Team collection and pulls out the object if players.refId no longer exists in the Player collection. how can I do this?

So after the cleanup the teams would look like this:

_id: ObjectId('61961ce8981cfc332fd71711'),
name: 'Team A',
players: [
    {
        refId: ObjectId('6098c5ab615d9e23543d0f6e'),
        goals: 13
    },
    {
        refId: ObjectId('61409537c4f113354782eea3'),
        goals: 6
    }
]

_id: ObjectId('61fd27c152d4a56752b8f05c'),
name: 'Team B',
players: [
    {
        refId: ObjectId('61409537c4f113354782eea3'),
        goals: 1
    }
]

_id: ObjectId('56fd77c152d4a56752b8y89h'),
name: 'Team C',
players: [
]

So far I have this:

db.Team.aggregate([
    { $unwind: "$players" },  
    {
        $lookup: {
            from: "Player",
            localField: "players.refId",
            foreignField: "_id",
            as: "player"
        }
    },  
    { $match: { player: { $size: 1 } } },  
    {
        $group: {
            _id: "$_id",
            players: { $push: "$players" }
        }
    },
    { $project: { _id: 1, players: 1 } },
    { $merge: { into: "Team", on: "_id", whenMatched: "merge" } }
])

This solution works as long as there are at least one player in each team that exists in the players collection. When a team only has players that does not exist in the players collection, they are not cleaned up because { $match: { player: { $size: 1 } } } filters them away. How do I solve this?

2

Answers


  1. db.Team.aggregate([
      // Unwind the players array
      { $unwind: "$players" },
      
      // Lookup the player in the Player collection
      {
        $lookup: {
          from: "Player",
          localField: "players.refId",
          foreignField: "_id",
          as: "player"
        }
      },
      
      // Filter out players that don't exist in the Player collection
      { $match: { player: { $size: 1 } } },
      
      // Group the players back into an array and replace the original players array
      {
        $group: {
          _id: "$_id",
          name: { $first: "$name" },
          players: { $push: "$players" }
        }
      },
      
      // Project only the necessary fields
      {
        $project: {
          _id: 1,
          name: 1,
          players: {
            refId: 1,
            goals: 1
          }
        }
      }
    ])
    
    Login or Signup to reply.
  2. There is more than one way to do this:
    First, there is a unwind/lookup/filter/group as in answer 1 (I think the match needs a small fix though (see my comment above).

    I’ve stumbled onto a slightly better solution.

         // first get the player id's into one array
        [{$addFields: {
            player_id: {$map: {
               input: '$players',
                in: '$$this.refId'
               }}
        }},
    {
        '$lookup': {
          'from': '$Players', 
          'let': {
            'lst': '$player_id'
          }, 
          'pipeline': [
            {
              '$match': {
                '$expr': {
                  // using indexOfArray, so a missing player is -1
                  // and an existing player has an index gt -1 
                  '$gt': [
                    {
                      '$indexOfArray': [
                        '$$lst', '$_id'
                      ]
                    }, -1
                  ]
                }
              }
            }, 
            {
              '$sort': {
                'name': 1
              }
            }
          ], 
          'as': 'x'
        }
      }]
    

    I did this on my own data and changed the variable names since you didn’t provide enough data for the test try mongoplayground

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search