Filtering documents in MongoDB using Node.JS for multiple constrains

BhaveshShaha
September 1, 2024
135 views
0 votes
2 Answers

I have 1000 questions (each as a document) in questionBank collection.

Each question has its own document-id, a subject key (s1, s2, s3, s4, ... ,s10) and difficult key (easy, medium, hard).

My objective is to fetch exactly 15 questions among these such that the following constraints are met:

Subject: Questions from s1, s2, ..., s5 subjects (each) shouldn’t exceed 2. Questions from s6, s7, ..., s10 subjects (each) shouldn’t be more than 3.
Difficulty: We want a maximum of 40% easy questions, a maximum of 75% medium questions, and a maximum of 40% hard questions. (right now, we are randomly generating the number of questions per difficulty).
Each user has visitedQuestions collection where the list of all the questions they’ve already attempted is stored by the document-id, so we also want to refrain using the questions that have been previously attempted.

result = await QuestionBank.aggregate([
            {
              $facet: {
                easyQuestions: [
                  {
                    $match: {
                      difficulty: "Easy",
                      ...criteriaSet,
                    },
                  },
                  { $sample: { size: difficultyRanges.easy } },
                ],
                mediumQuestions: [
                  {
                    $match: {
                      difficulty: "Medium",
                      ...criteriaSet,
                    },
                  },
                  { $sample: { size: difficultyRanges.medium } },
                ],
                hardQuestions: [
                  {
                    $match: {
                      difficulty: "Hard",
                      ...criteriaSet,
                    },
                  },
                  { $sample: { size: difficultyRanges.hard } },
                ],
              },
            },
            {
              $project: {
                questions: {
                  $concatArrays: [
                    "$easyQuestions",
                    "$mediumQuestions",
                    "$hardQuestions",
                  ],
                },
              },
            },
            {
              $unwind: "$questions",
            },
            {
              $group: {
                _id: "$questions.subTopic",
                questions: { $push: "$questions" },
                count: { $sum: 1 },
              },
            },
            {
              $match: {
                $or: SAToicLimits,
              },
            },
            {
              $limit: 15,
            },
          ]);
          let resres: any = [];
          for (let i = 0; i < result.length; i++) {
            resres = [...resres, ...result[i].questions];
          }
          result = resres;
          if (result.length === 15) {
            break;
          }

Note: SAToicLimits has the name of the subjects and the respective limits.

What is the most efficient way to go about it?

Answers

I don’t think you can achieve this by an aggregation pipeline only.

My proposal would be this one: First create a random order of documents. This you can do by sorting by a hash value. Using $dateToString: { date: "$$NOW" } guaranties that the order is different for each call (unless executed multiple times within one millisecond). You could also use a static value (e.g. a, b, c) to get a random order which is deterministic.

Then create kind of bucket-list. Fill the bucket one-by-one till all constraints are done:

const visitedQuestions = db.visitedQuestions.find({user: "Shaha"}).toArray().map( x => x._id )

const questions = await QuestionBank.aggregate([
  { $match: { _id: { $nin: visitedQuestions } } },
  {
    $set: {
      orderBy: {
        $toHashedIndexKey: {
          $concat: [
            { $toString: "$_id" },
            { $dateToString: { date: "$$NOW" } } // or a static value for random orders which are deterministic
          ]
        }
      }
    }
  },
  { $sort: { orderBy: 1 } }
]).toArray();

let difficulties = { easy: 15*0.4, medium: 15*0.75, hard: 15*0.4 };
let subjects = { s1: 2, s2: 2, s3: 2, s4: 2, s5: 2, s6: 3, s7: 3, s8: 3, s9: 3, s10: 3}

let result = [];    
for (let q of questions) {
   if (difficulties[q.difficulty] > 0 && subjects[q.subject] > 0) {
      difficulties[q.difficulty]--;
      subjects[q.subject]--;
      result.push(q);
   }
   if (result.length > 15) break;
}
return result;

You can do the followings in the aggregation pipeline:

$lookup to the visitedQuestions collection and $match with the userId for the user that you are targeting
$match to check for no result is looked up from visitedQuestions. That means the question is not visited by the user.
$set a random key for every questions
$setWindowFields to compute subjectRank with $rank in the partition of subject and order by the randomKey we generated
repeat step 4, only with the change in partition to difficulty, to compute difficultyRank
$set a boolean/flag qualified, with the following criteria, in a $and manner:
1. subject criteria
  - if subject is in s1 – s5, we yield the result of subjectRank <= 2
  - if subject is in s6 – s10, we yield the result of subjectRank <= 3
  - otherwise, we yield a default value of false
2. difficulty criteria
  - if difficulty is easy, we yield the result of difficultyRank <= 15 * 40% = 6
  - if difficulty is medium, we yield the result of difficultyRank <= 15 * 75% = 11.25
  - if difficulty is hard, we yield the result of difficultyRank <= 15 * 40% = 6
  - otherwise, we yield a default value of false
Now with the qualified flag, we know which question can be picked by $match: {qualified: true}
$sample 15 questions
(optional) $unset the helper fields

db.questionBank.aggregate([
  {
    "$lookup": {
      "from": "visitedQuestions",
      "localField": "_id",
      "foreignField": "_id",
      "pipeline": [
        {
          "$match": {
            // the user's id that you are targeting
            "userId": ObjectId("5a934e000102030405000000")
          }
        }
      ],
      "as": "visitedQuestionsLookup"
    }
  },
  {
    "$match": {
      // not found in visitedQuestions
      "visitedQuestionsLookup": []
    }
  },
  {
    "$set": {
      "visitedQuestionsLookup": "$$REMOVE",
      "randomKey": {
        "$rand": {}
      }
    }
  },
  {
    "$setWindowFields": {
      "partitionBy": "$subject",
      "sortBy": {
        "randomKey": 1
      },
      "output": {
        "subjectRank": {
          "$rank": {}
        }
      }
    }
  },
  {
    "$setWindowFields": {
      "partitionBy": "$difficulty",
      "sortBy": {
        "randomKey": 1
      },
      "output": {
        "difficultyRank": {
          "$rank": {}
        }
      }
    }
  },
  {
    "$set": {
      "qualified": {
        "$and": [
          {
            "$switch": {
              "branches": [
                // s1 - s5 <= 2
                {
                  "case": {
                    "$in": [
                      "$subject",
                      [
                        "s1",
                        "s2",
                        "s3",
                        "s4",
                        "s5"
                      ]
                    ]
                  },
                  "then": {
                    "$lte": [
                      "$subjectRank",
                      2
                    ]
                  }
                },
                // s6 - s10 <= 3
                {
                  "case": {
                    "$in": [
                      "$subject",
                      [
                        "s6",
                        "s7",
                        "s8",
                        "s9",
                        "s10"
                      ]
                    ]
                  },
                  "then": {
                    "$lte": [
                      "$subjectRank",
                      3
                    ]
                  }
                }
              ],
              "default": false
            }
          },
          {
            "$switch": {
              "branches": [
                // 40% easy = 6
                {
                  "case": {
                    "$eq": [
                      "easy",
                      "$difficulty"
                    ]
                  },
                  "then": {
                    "$lte": [
                      "$difficultyRank",
                      6
                    ]
                  }
                },
                // 75% medium = 11.25
                {
                  "case": {
                    "$eq": [
                      "medium",
                      "$difficulty"
                    ]
                  },
                  "then": {
                    "$lte": [
                      "$difficultyRank",
                      11.25
                    ]
                  }
                },
                // 40% hard = 6
                {
                  "case": {
                    "$eq": [
                      "hard",
                      "$difficulty"
                    ]
                  },
                  "then": {
                    "$lte": [
                      "$difficultyRank",
                      6
                    ]
                  }
                }
              ],
              "default": false
            }
          }
        ]
      }
    }
  },
  {
    "$match": {
      "qualified": true
    }
  },
  {
    "$sample": {
      "size": 15
    }
  },
  {
    "$unset": [
      "difficultyRank",
      "qualified",
      "randomKey",
      "subjectRank"
    ]
  }
])

Mongo Playground

Please signup or login to give your own answer.

Click here to cancel reply.