skip to Main Content

I apologize for the vague question description, but I have quite a complex question regarding filtration in MongoDB aggregations. Please, see my data schema to understand the question better:

Company {
  _id: ObjectId
 name: string
}
License {
  _id: ObjectId
  companyId: ObjectId
  userId: ObjectId
}
User {
  _id: ObjectId
  companyId: ObjectId
  email: string
}

The goal:

I would like to query all non-licensed users. In order to do this, you would need these plain MongoDB queries:

const licenses = db.licenses.find({ companyId }); // Get all licenses for specific company
const userIds = licenses.toArray().map(l => l.userId); // Collect all licensed user ids

const nonLicensedUsers = db.users.find({ _id: { $nin: userIds } }); // Query all users that don't hold a license

The problem:

The code above works perfectly fine. However, in our system, companies may have hundreds of thousands of users. Therefore, the first and the last step become exceptionally expensive. I’ll elaborate on this. First things first, you need to fetch a big number of documents from DB and transmit them via the network, which is fairly expensive. Then, we need to pass a huge $nin query to MongoDB over the network again, which doubles overhead costs.

So, I would like to perform all the mentioned operations on the MongoDB end and return a small slice of non-licensed users to avoid network transmission costs. Are there ideas on how to achieve this?

I was able to come pretty close using the following aggregation (pseudo-code):

db.company.aggregate([
  { $match: { _id: id } }, // Step 1. Find the company entity by id
  { $lookup: {...} }, // Step 2. Joins 'users' collection by `companyId` field
  { $lookup: {...} }, // Step 3. Joins 'licenses' collection by `companyId` field
  { 
    $project: {
      licensesMap: // Step 4. Convert 'licenses' array to the map with the shape { 'user-id': true }. Could be done with $arrayToObject operator
    }
  },
  {
    $project: {
       unlicensedUsers: {
             $filter: {...} // And this is the place, where I stopped
          }
     }
  }
]);

Let’s have a closer look at the final stage of the above aggregation. I tried to utilize the $filter aggregation in the following manner:

{
    $filter: {
     input: "$users"
     as: "user",
     cond: {
       $neq: ["$licensesMap[$$user._id]", true]
     }
  }
}

But, unfortunately, that didn’t work. It seemed like MongoDB didn’t apply interpolation and just tried to compare a raw "$licensesMap[$$user._id]" string with true boolean value.

Note #1:

Unfortunately, we’re not in a position to change the current data schema. It would be costly for us.

Note #2:

I didn’t include this in the aggregation example above, but I did convert Mongo object ids to strings to be able to create the licensesMap. And also, I stringified the ids of the users list to be able to access licensesMap properly.

Sample data:

Companies collection:

[
  { _id: "1", name: "Acme" }
]

Licenses collection

[
  { _id: "1", companyId: "1", userId: "1" },
  { _id: "2", companyId: "1", userId: "2" }
]

Users collection:

[
  { _id: "1", companyId: "1" },
  { _id: "2", companyId: "1" },
  { _id: "3", companyId: "1" },
  { _id: "4", companyId: "1" },
]

The expected result is:

[
  _id: "1", // company id
  name: "Acme",
  unlicensedUsers: [
    { _id: "3", companyId: "1" },
    { _id: "4", companyId: "1" },
  ]
]

Explanation: unlicensedUsers list contains the third and the fourth users because they don’t have corresponding entries in the licenses collection.

2

Answers


  1. Chosen as BEST ANSWER

    If you're facing a similar situation. Bear in mind that the above solution will work fast only with the hashed index.


  2. How about something simple like:

    db.usersCollection.aggregate([
      {
        $lookup: {
          from: "licensesCollection",
          localField: "_id",
          foreignField: "userId",
          as: "licensedUsers"
        }
      },
      {$match: {"licensedUsers.0": {$exists: false}}},
      {
        $group: {
          _id: "$companyId",
          unlicensedUsers: {$push: {_id: "$_id", companyId: "$companyId"}}
        }
      },
      {
        $lookup: {
          from: "companiesCollection",
          localField: "_id",
          foreignField: "_id",
          as: "company"
        }
      },
      {$project: {unlicensedUsers: 1, company: {$arrayElemAt: ["$company", 0]}}},
      {$project: {unlicensedUsers: 1, name: "$company.name"}}
    ])
    

    playground example

    users collection and licenses collection, both have anything you need on the users so after the first $lookup that "merges" them, and a simple $match to keep only the unlicensed users, all that left is just formatting to the format you request.

    Bonus: This solution can work with any type of id. For example playground

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search