skip to Main Content

I have 2 collections users and files. A user document looks like this :

{
  "_id": "user1",
  "name": "John"
}

A file document looks like this :

{
  "_id": "fileID1",
  "url": "any_url",
  "userID": "user1"
}

I have a list of user IDs let’s say : ["user1", "user2", "user5"] and i want to perform a single query that takes this list as input and returns 5 files of each user ID.

For now i have a naive logic that is not really performant :

result_users = users_collection.find({}).limit(15)
  for user in result_users:
    user_id.append(user["_id"])
    result_files = files_collection.find({"userID": user["_id"]}).limit(5)
      for f in result_files:
        file_dict[user["_id"]].append(f)

The query i am looking for would naively be something like :
result_files = files_collection.find({"userID": {"$in / $each": user_ids_list}).limit(5)

The output i am looking for is :

{
  "user1": ["fileID1", "fileID2", "fileID8", "fileID3"],
  "user2": ["fileID4", "fileID5", "fileID6", "fileID9", "fileID7"],
  ...
}

With at max 5 files per userID.

I am pretty sure this is achievable with aggregation but i have no clue how to write it correctly.

Note : i don’t really care about the output format, i only need a list of file that is sorted correctly with 5 files per userID.

I can map the output myself when i am confident the query returns the right files.

Thanks !

Edit : it would be amazing to add a sort("at", DESCENDING) on the files collection aggregation

2

Answers


  1. That’s in JS, but I am sure in py you could do something like below:

    const user_ids_list = ["user1", "user2", "user5"];
    
    const result_files = await files_collection.aggregate([
      // Match files where userID is in the user_ids_list
      { 
        $match: { 
          userID: { $in: user_ids_list } 
        }
      },
      // Group by userID and collect up to 5 files per user
      { 
        $group: {
          _id: "$userID",
          files: { $push: "$_id" }
        }
      },
      // Limit to the first 5 files for each userID
      { 
        $project: { 
          files: { $slice: ["$files", 5] } 
        }
      }
    ]).toArray();
    
    console.log(result_files);
    

    This should give you an output similar to:

    [
      { "_id": "user1", "files": ["fileID1", "fileID2", "fileID8", "fileID3", "fileID9"] },
      { "_id": "user2", "files": ["fileID4", "fileID5", "fileID6", "fileID7", "fileID10"] },
      ...
    ]
    
    Login or Signup to reply.
  2. since you already have the user names in the file docs you won’t need to query user collection.

    1. First narrow down the files using a $match based on existence of user in input array
    2. Group by the userID field and push the file id and the timestamp (will need later for sorting)
    3. Now call $sortArray on the grouped file array (v). Now it is sorted in descending(-1). On that sorted call $slice and finally map over the sliced array to get an array of just the ids.
      this will now generate in the format [ { _id:"user1",v: [...]}, { _id:"user2",v: [...]}... ]
    db.files.aggregate([
      {
        $match: { userID: { $in: ["user1", "user2", "user5"] } }
      },
      {
        $group: {
          _id: "$userID",
          v: { $push: { id: "$_id", at: "$at" } }
        }
      },
      {
        $project: {
          v: {
            $map: {
              input: { $slice: [{ $sortArray: { input: "$v", sortBy: { at: -1 } } }, 5] },
              in: "$$this.id"
            }
          }
        }
      }
    ])
    
    

    playground

    if you want to further convert it to { "user1": [...], "user2": [...], ... } format you can add these 2 stages

      {
        $group: {
          _id: null,
          kv: { $push: { k: "$_id", v: "$v" } }
        }
      },
      {
        $replaceRoot: { newRoot: { $arrayToObject: "$kv" } }
      }
    
    

    playground

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search