I am using mongoose to query a really big list from Mongodb
const chat_list = await chat_model.find({}).sort({uuid: 1}); // uuid is a index
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1});// create_time is a index of message collection, time: t1
// chat_list length is around 2,000, msg_list length is around 90,000
compute(chat_list, msg_list); // time: t2
function compute(chat_list, msg_list) {
for (let i = 0, len = chat_list.length; i < len; i++) {
msg_list.filter(msg => msg.uuid === chat_list[i].uuid)
// consistent handling for every message
}
}
for above code, t1 is about 46s, t2 is about 150s
t2 is really to big, so weird.
then I cached these list to local json file,
const chat_list = require('./chat-list.json');
const msg_list = require('./msg-list.json');
compute(chat_list, msg_list); // time: t2
this time, t2 is around 10s.
so, here comes the question, 150 seconds vs 10 seconds, why? what happened?
I tried to use worker to do the compute step after mongo query, but the time is still much bigger than 10s
3
Answers
The mongodb query returns a FindCursor that includes arrayish methods like
.filter()
but the result is not an Array.Use
.toArray()
on the cursor before filtering to process the mongodb result set like for like. That might not make the overall process any faster, as the result set still needs to be fetched from mongodb, but compute will be similar.Matt typed faster than I did, so some of what was suggested aligns with part of this answer.
I think you are measuring and comparing something different than what you are expecting and implying.
Your expectation is that the
compute()
function takes around 10 seconds once all of the data is loaded by the application. This is (mostly) demonstrated by your second test, apart from the fact that that test includes the time it takes to load the data from the local files. But you’re seeing that there is a difference of104
seconds (150 - 46
) between the completion ofmessage_model.find()
andcompute()
hence leading to the question.The key thing is that successfully advancing from the
find
againstmessage_model
is not the same thing as retrieving all of the results. As @Matt notes, thefind()
will return with a cursor object once the initial batch of results are ready. That is very different than retrieving all of the results. So there is more work (apparently ~94 seconds worth) left to do from the twofind()
operations to further iterate the cursors and retrieve the rest of the results. This additional time is getting reported inside oft2
.Ass suggested by @Matt, calling
.toArray()
should shift that time back intot1
as you are expecting. Also sounds like it may be more correct due to ambiguity with.filter()
functions.There are two other things that catch my attention. The first is: why are you retrieving all of this data client-side to do the filtering there? Perhaps you would like to do this
uuid
matching inside of the database via$lookup
?Secondly, this comment isn’t clear to me:
create_time
itself is a field here, existent or not, that you are requesting an ascending sort against.You are taking data from 2 tables, then with for loop you are comparing ID using filter function, what is happening now is your loop will be executed 2000 time and so the filter function also which contains 90000 records.
So take a worst case scenario here lets consider 2000 uuid you are getting is not inside the msg_list, here you are executing loop 2000*90000 even though you are not getting data.
It wan’t take more than 10 to 15 secs if use below code.
The above result is doing same as you have done in your code with filter function and loop but this is proper and fastest way to receive the data you required.