I apologize for the vague question description, but I have quite a complex question regarding filtration in MongoDB aggregations. Please, see my data schema to understand the question better:
Company {
_id: ObjectId
name: string
}
License {
_id: ObjectId
companyId: ObjectId
userId: ObjectId
}
User {
_id: ObjectId
companyId: ObjectId
email: string
}
The goal:
I would like to query all non-licensed users. In order to do this, you would need these plain MongoDB queries:
const licenses = db.licenses.find({ companyId }); // Get all licenses for specific company
const userIds = licenses.toArray().map(l => l.userId); // Collect all licensed user ids
const nonLicensedUsers = db.users.find({ _id: { $nin: userIds } }); // Query all users that don't hold a license
The problem:
The code above works perfectly fine. However, in our system, companies may have hundreds of thousands of users. Therefore, the first and the last step become exceptionally expensive. I’ll elaborate on this. First things first, you need to fetch a big number of documents from DB and transmit them via the network, which is fairly expensive. Then, we need to pass a huge $nin
query to MongoDB over the network again, which doubles overhead costs.
So, I would like to perform all the mentioned operations on the MongoDB end and return a small slice of non-licensed users to avoid network transmission costs. Are there ideas on how to achieve this?
I was able to come pretty close using the following aggregation (pseudo-code):
db.company.aggregate([
{ $match: { _id: id } }, // Step 1. Find the company entity by id
{ $lookup: {...} }, // Step 2. Joins 'users' collection by `companyId` field
{ $lookup: {...} }, // Step 3. Joins 'licenses' collection by `companyId` field
{
$project: {
licensesMap: // Step 4. Convert 'licenses' array to the map with the shape { 'user-id': true }. Could be done with $arrayToObject operator
}
},
{
$project: {
unlicensedUsers: {
$filter: {...} // And this is the place, where I stopped
}
}
}
]);
Let’s have a closer look at the final stage of the above aggregation. I tried to utilize the $filter aggregation in the following manner:
{
$filter: {
input: "$users"
as: "user",
cond: {
$neq: ["$licensesMap[$$user._id]", true]
}
}
}
But, unfortunately, that didn’t work. It seemed like MongoDB didn’t apply interpolation and just tried to compare a raw "$licensesMap[$$user._id]"
string with true
boolean value.
Note #1:
Unfortunately, we’re not in a position to change the current data schema. It would be costly for us.
Note #2:
I didn’t include this in the aggregation example above, but I did convert Mongo object ids to strings to be able to create the licensesMap
. And also, I stringified the ids of the users
list to be able to access licensesMap
properly.
Sample data:
Companies collection:
[
{ _id: "1", name: "Acme" }
]
Licenses collection
[
{ _id: "1", companyId: "1", userId: "1" },
{ _id: "2", companyId: "1", userId: "2" }
]
Users collection:
[
{ _id: "1", companyId: "1" },
{ _id: "2", companyId: "1" },
{ _id: "3", companyId: "1" },
{ _id: "4", companyId: "1" },
]
The expected result is:
[
_id: "1", // company id
name: "Acme",
unlicensedUsers: [
{ _id: "3", companyId: "1" },
{ _id: "4", companyId: "1" },
]
]
Explanation: unlicensedUsers
list contains the third and the fourth users because they don’t have corresponding entries in the licenses
collection.
2
Answers
If you're facing a similar situation. Bear in mind that the above solution will work fast only with the
hashed
index.How about something simple like:
playground example
users
collection andlicenses
collection, both have anything you need on the users so after the first$lookup
that "merges" them, and a simple$match
to keep only the unlicensed users, all that left is just formatting to the format you request.Bonus: This solution can work with any type of id. For example playground