Here is a snapshot of my code.
exports.getTestCases = async (bucketName, startTimeStamp, endTimeStamp) => {
let results = [];
let startTime = parseInt(startTimeStamp);
let endTime = parseInt(endTimeStamp);
const maxDataLimit = 1500;
const maxYearsBack = 10;
const oneYearInMilliseconds = 31536000000;
let query = {
executiondate: { $gte: startTime, $lte: endTime } // --> 'executiondate' is indexed in DB
};
const projection = {'Some of my required fields'};
const DynamicModel = getModelForCollection(bucketName);
while (results.length < maxDataLimit) {
console.log(`Fetching data: Start Time: ${new Date(startTime)}, End Time: ${new Date(endTime)}`);
try {
results = await DynamicModel.find(query, projection)
.sort({ executiondate: -1 })
.limit(maxDataLimit)
.lean()
.exec();
console.log(`Total records fetched: ${results.length}`);
if (results.length >= maxDataLimit) break;
} catch (error) {
console.error("Failed to fetch data:", error);
throw error;
}
endTime -= oneYearInMilliseconds;
query.executiondate.$gte = endTime;
if ((startTime - endTime) > maxYearsBack * oneYearInMilliseconds) {
break;
}
}
return results.slice(0, maxDataLimit);
};
What the piece of code is doing is fetching a total 10 years of data if max limit is not met. Increasing it one year at a time. The problem is it’s taking minimum 7-8 seconds to fetch let say 600 data as it’s iterating. bucketName
is collection for MongoDB
Is there anyway to improve the fetch? Maybe any different approach to re-write the whole logic? I can not think of alternate
of loop
which is the culprit I am guessing. I am already using lean()
. A slight improvement but not satisfactory. New to node.js and mongo db so, any suggestion will be helpful.
2
Answers
Your query for the
executiondate
is doing a full scan of the database every time then sorting the results. Think of it: how can the engine do the compares without looking at every document?The best practice for query sequence is "ESR": find Exact match items, then Sort, then Range. You are doing the sort after range.
Also, if you haven’t done so, create an index for
executiondate
. I hope this helps.That query is properly indexed, but the loop seems a bit inefficient.
If we consider a run that requires reading back to 10 years, what will happen is:
Then
Then repeat, again, 8 more times.
This means that if you have 100 documents per year for the last 10 years, this full process will return 1000 documents, but the documents for the first year will have been fetched and transferred over the network 10 times, the second year 9 times, and so on, for a total of 5500 documents fetched and transferred over the network.
I can think of 2 different ways to improve this loop:
Initialize
results
to an empty array before the loop starts, then inside each loop query only for the current year’s worth of data, i.e. $gte: endTime, $lt: lastEndTime, and combine the returned documents with the results arrayCounting documents with a query on an indexed field can use a very efficient COUNT_SCAN, so instead of returning documents in the loop, just ask the server for the count of documents in the range. A COUNT_SCAN will not need to read any documents, and will return only a number over the network, so 10 count commands might even complete faster than a single fetch. After the loop, you will know which start time will result in enough documents to satisfy the requirement, so a single find query after the loop can get all the necessary documents in a single go.