skip to Main Content

I am using mongoose to query a really big list from Mongodb

const chat_list = await chat_model.find({}).sort({uuid: 1}); // uuid is a index
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1});// create_time is a index of message collection, time: t1

// chat_list length is around 2,000, msg_list length is around 90,000
compute(chat_list, msg_list); // time: t2

function compute(chat_list, msg_list) {
  for (let i = 0, len = chat_list.length; i < len; i++) {
      msg_list.filter(msg => msg.uuid === chat_list[i].uuid)
    // consistent handling for every message
  }
}

for above code, t1 is about 46s, t2 is about 150s

t2 is really to big, so weird.

then I cached these list to local json file,

const chat_list = require('./chat-list.json');
const msg_list = require('./msg-list.json');

compute(chat_list, msg_list); // time: t2

this time, t2 is around 10s.

so, here comes the question, 150 seconds vs 10 seconds, why? what happened?

I tried to use worker to do the compute step after mongo query, but the time is still much bigger than 10s

3

Answers


  1. The mongodb query returns a FindCursor that includes arrayish methods like .filter() but the result is not an Array.

    Use .toArray() on the cursor before filtering to process the mongodb result set like for like. That might not make the overall process any faster, as the result set still needs to be fetched from mongodb, but compute will be similar.

    const chat_list = await chat_model
      .find({})
      .sort({uuid: 1})
      .toArray()
    const msg_list = await message_model
      .find({}, {content: 1, xxx})
      .sort({create_time: 1})
      .toArray()
    
    Login or Signup to reply.
  2. Matt typed faster than I did, so some of what was suggested aligns with part of this answer.

    I think you are measuring and comparing something different than what you are expecting and implying.

    Your expectation is that the compute() function takes around 10 seconds once all of the data is loaded by the application. This is (mostly) demonstrated by your second test, apart from the fact that that test includes the time it takes to load the data from the local files. But you’re seeing that there is a difference of 104 seconds (150 - 46) between the completion of message_model.find() and compute() hence leading to the question.

    The key thing is that successfully advancing from the find against message_model is not the same thing as retrieving all of the results. As @Matt notes, the find() will return with a cursor object once the initial batch of results are ready. That is very different than retrieving all of the results. So there is more work (apparently ~94 seconds worth) left to do from the two find() operations to further iterate the cursors and retrieve the rest of the results. This additional time is getting reported inside of t2.

    Ass suggested by @Matt, calling .toArray() should shift that time back into t1 as you are expecting. Also sounds like it may be more correct due to ambiguity with .filter() functions.

    There are two other things that catch my attention. The first is: why are you retrieving all of this data client-side to do the filtering there? Perhaps you would like to do this uuid matching inside of the database via $lookup?

    Secondly, this comment isn’t clear to me:

    // create_time is a index of message collection, time: t1
    

    create_time itself is a field here, existent or not, that you are requesting an ascending sort against.

    Login or Signup to reply.
  3. You are taking data from 2 tables, then with for loop you are comparing ID using filter function, what is happening now is your loop will be executed 2000 time and so the filter function also which contains 90000 records.

    So take a worst case scenario here lets consider 2000 uuid you are getting is not inside the msg_list, here you are executing loop 2000*90000 even though you are not getting data.

    It wan’t take more than 10 to 15 secs if use below code.

    //This will generate array of uuid present in message_model
    const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1}).distinct("uuid");
    
    // Below query will match all uuid present in msg_list array with chat_list UUID
    const chat_list = await chat_model.find({uuid:{$in:msg_list}}).sort({uuid: 1});
    

    The above result is doing same as you have done in your code with filter function and loop but this is proper and fastest way to receive the data you required.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search