I have 1000 questions (each as a document) in questionBank
collection.
Each question has its own document-id, a subject key (s1, s2, s3, s4, ... ,s10)
and difficult key (easy, medium, hard)
.
My objective is to fetch exactly 15 questions among these such that the following constraints are met:
- Subject: Questions from
s1, s2, ..., s5
subjects (each) shouldn’t exceed 2. Questions froms6, s7, ..., s10
subjects (each) shouldn’t be more than 3. - Difficulty: We want a maximum of 40% easy questions, a maximum of 75% medium questions, and a maximum of 40% hard questions. (right now, we are randomly generating the number of questions per difficulty).
- Each user has
visitedQuestions
collection where the list of all the questions they’ve already attempted is stored by the document-id, so we also want to refrain using the questions that have been previously attempted.
result = await QuestionBank.aggregate([
{
$facet: {
easyQuestions: [
{
$match: {
difficulty: "Easy",
...criteriaSet,
},
},
{ $sample: { size: difficultyRanges.easy } },
],
mediumQuestions: [
{
$match: {
difficulty: "Medium",
...criteriaSet,
},
},
{ $sample: { size: difficultyRanges.medium } },
],
hardQuestions: [
{
$match: {
difficulty: "Hard",
...criteriaSet,
},
},
{ $sample: { size: difficultyRanges.hard } },
],
},
},
{
$project: {
questions: {
$concatArrays: [
"$easyQuestions",
"$mediumQuestions",
"$hardQuestions",
],
},
},
},
{
$unwind: "$questions",
},
{
$group: {
_id: "$questions.subTopic",
questions: { $push: "$questions" },
count: { $sum: 1 },
},
},
{
$match: {
$or: SAToicLimits,
},
},
{
$limit: 15,
},
]);
let resres: any = [];
for (let i = 0; i < result.length; i++) {
resres = [...resres, ...result[i].questions];
}
result = resres;
if (result.length === 15) {
break;
}
Note: SAToicLimits
has the name of the subjects and the respective limits.
What is the most efficient way to go about it?
2
Answers
I don’t think you can achieve this by an aggregation pipeline only.
My proposal would be this one: First create a random order of documents. This you can do by sorting by a hash value. Using
$dateToString: { date: "$$NOW" }
guaranties that the order is different for each call (unless executed multiple times within one millisecond). You could also use a static value (e.g.a
,b
,c
) to get a random order which is deterministic.Then create kind of bucket-list. Fill the bucket one-by-one till all constraints are done:
You can do the followings in the aggregation pipeline:
$lookup
to thevisitedQuestions
collection and$match
with theuserId
for the user that you are targeting$match
to check for no result is looked up fromvisitedQuestions
. That means the question is not visited by the user.$set
a random key for every questions$setWindowFields
to computesubjectRank
with$rank
in the partition ofsubject
and order by therandomKey
we generateddifficulty
, to computedifficultyRank
$set
a boolean/flagqualified
, with the following criteria, in a$and
manner:subject
is in s1 – s5, we yield the result ofsubjectRank
<= 2subject
is in s6 – s10, we yield the result ofsubjectRank
<= 3difficulty
is easy, we yield the result ofdifficultyRank
<= 15 * 40% = 6difficulty
is medium, we yield the result ofdifficultyRank
<= 15 * 75% = 11.25difficulty
is hard, we yield the result ofdifficultyRank
<= 15 * 40% = 6qualified
flag, we know which question can be picked by$match: {qualified: true}
$sample
15 questions$unset
the helper fieldsMongo Playground