skip to Main Content

I’ve been trying to find a way to limit the number of objects i’m pushing to arrays I’m creating while using "aggregate" on a MongoDB collection.
I have a collection of students – each has these relevant keys:
class number it takes this semester (only one value),
percentile in class (exists if is enrolled in class, null if not),
current score in class (> 0 if enrolled in class, else – 0),
total average (GPA),
max grade

I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.

This is my query:

db.getCollection("students").aggregate([
{"$match": {
    "class_number": 
        {"$in": [49, 50, 16]},
    "grades.curr_class.percentile": 
        {"$exists": true},
    "grades.min": {"$gte": 80},
    }},
    {"$sort": {"grades.curr_class.score": -1}},
    {"$group": {"_id": "$class_number", 
                "studentsWithHighGPA": 
                    {"$push": 
                        {"$cond": [{"$gte": ["$grades.gpa", 80]},
                        {"id": "$_id"},
                        "$$REMOVE"]
                        }
                    },
                 "studentsWithoutHighGPA":
                 {"$push": 
                        {"$cond": [{"$lt": ["$grades.gpa", 80]},
                        {"id": "$_id"},
                        "$$REMOVE"]
                        }, 
                    },
                    }, 
                },
])

What i’m trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i’m not sure how to approach this.

Thanks in advance!

I’ve tried using limit in different variations, and slice too, but none seem to work.

3

Answers


  1. I don’t think there is a mongodb-provided operator to apply a limit inside of a $group stage.

    You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.

    Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:

          "studentsWithHighGPA": {
            "$accumulator": {
              init: "function(){
                         return {combined:[]};
              }",
              accumulate: "function(state, id, score){
                                  if (score >= 80) {
                                        state.combined.push({_id:id, score:score})
                                  };
                                  return {combined:state.combined.slice(0,16)}
              }",
              accumulateArgs: [ "$_id", "$grades.gpa"],
              merge: "function(A,B){
                         return {combined: 
                                  A.combined.concat(B.combined).sort(
                                        function(SA,SB){
                                               return (SB.score - SA.score)
                                        })
                                }
              }",
              finalize: "function(s){
                   return s.combined.slice(0,16).map(function(A){
                      return {_id:A._id}
                   })
              }",
              lang: "js"
            }
          }
    

    Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

    Login or Signup to reply.
  2. Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.

    Something like this:

    const pipeline = [
        {
            '$facet': {
                'studentsWithHighGPA': [
                    { '$match': { 'grade': { '$gte': 80 } } }, 
                    { '$sort': { 'grade': -1 } }, 
                    { '$limit': 16 }
                ], 
                'studentsWithoutHighGPA': [
                    { '$match': { 'grade': { '$lt': 80 } } },
                    { '$sort': { 'grade': -1 } },
                    {  '$limit': 16 }
                ]
            }
        }
    ];
    
        
    coll.aggregate(pipeline)
    

    This should end up with one document including two arrays.

    studentsWithHighGPA (array)
        0 (object)
        1 (object)
        ...
    studentsWithoutHighGPA (array)
        0 (object)
        1 (object)
    

    See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.

    https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/

    Login or Signup to reply.
  3. Since mongoDb version 5.0, one option is to use $setWindowFields for this, and in particular, its $rank option. This will allow to keep only the relevant students and limit their count even before the $group step:

    1. $match only relevant students as suggested by the OP
    2. $set the groupId for the setWindowFields (as it can currently partition by one key only
    3. $setWindowFields to define the rank of each student in their array
    4. $match only students with the wanted rank
    5. $group by class_number as suggested by the OP:
    db.collection.aggregate([
      {$match: { 
         class_number: {$in: [49, 50, 16]},
          "grades.curr_class.percentile": {$exists: true},
          "grades.min": {$gte: 80}
      }},
      {$set: {
          groupId: {$concat: [
              {$toString: "$class_number"},
              {$toString: {$toBool: {$gte: ["$grades.gpa", 80]}}}
          ]}
      }},
      {$setWindowFields: {
          partitionBy: "$groupId",
          sortBy: {"grades.curr_class.score": -1},
          output: {rank: {$rank: {}}}
      }},
      {$match: {rank: {$lte: rankLimit}}},
      {$group: {
          _id: "$class_number",
          studentsWithHighGPA: {$push: {
              $cond: [{$gte: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}},
          studentsWithoutHighGPA: {$push: {
              $cond: [{$lt: ["$grades.gpa", 80]}, {id: "$_id"}, "$$REMOVE"]}}
      }}
    ])
    

    See how it works on the playground example

    *This solution will limit the rank of the students, so there is an edge case of more than n students in the array (In case there are multiple students with the exact rank of n). it can be simply solved by adding a $slice step

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search