I have some mongodb queries that are returning 1000 results, when I look in the mongodb.com profiler it shows me this:
{
"command": {
"getMore": 8223354687588024000,
"collection": "reservations",
"batchSize": 1000,
"lsid": {
"id": {
"$binary": {
"base64": "n8eH91eURw+xpT6fNEPURQ==",
"subType": "04"
}
}
},
"$clusterTime": {
"clusterTime": {
"$timestamp": {
"t": 1659066401,
"i": 542
}
},
"signature": {
"hash": {
"$binary": {
"base64": "PHh4eHh4eD4=",
"subType": "00"
}
},
"keyId": 7090947493382324000
}
},
"$db": "superhosttools"
},
"originatingCommand": {
"aggregate": "reservations",
"pipeline": [
{
"$changeStream": {
"fullDocument": "updateLookup"
}
}
],
"cursor": {},
"lsid": {
"id": {
"$binary": {
"base64": "n8eH91eURw+xpT6fNEPURQ==",
"subType": "04"
}
}
},
"$clusterTime": {
"clusterTime": {
"$timestamp": {
"t": 1659064839,
"i": 29
}
},
"signature": {
"hash": {
"$binary": {
"base64": "PHh4eHh4eD4=",
"subType": "00"
}
},
"keyId": 7090947493382324000
}
},
"$db": "superhosttools"
},
"planSummary": [
{
"COLLSCAN": {}
}
],
"cursorid": 8223354687588024000,
"keysExamined": 0,
"docsExamined": 26879,
"numYields": 210,
"nreturned": 1000,
"reslen": 15283228,
"locks": {
"ReplicationStateTransition": {
"acquireCount": {
"w": 2206
}
},
"Global": {
"acquireCount": {
"r": 2206
}
},
"Database": {
"acquireCount": {
"r": 2206
}
},
"Collection": {
"acquireCount": {
"r": 1994
}
},
"Mutex": {
"acquireCount": {
"r": 1996
}
},
"oplog": {
"acquireCount": {
"r": 211
}
}
},
"storage": {
"data": {
"bytesRead": 2083760,
"timeReadingMicros": 4772
}
},
"protocol": "op_msg",
"millis": 249,
"v": "4.2.21"
}
This looks like the interesting part and we use change streams but I don’t know why we would get 1000 results:
"pipeline": [
{
"$changeStream": {
"fullDocument": "updateLookup"
}
}
],
I’m trying to optimize my mongodb server. Any help how to make this query more efficient is appreciated.
Update #1
I removed the {"fullDocument": "updateLookup"}
parameter from my watch code which seemed to help but I’m I’m still get some similar queries returning 1000 documents:
"aggregate": "reservations",
"pipeline": [
{
"$changeStream": {
"fullDocument": "default"
}
}
],
I’m using now using the following code to implement the change stream:
Reservation.watch([]).on("change", async (change: ChangeEvent<ReservationDocument>) => {...});
I’m not wondering if I should add a query to the .watch([])
call to limit the number of documents? What is considered best practices with change streams?
2
Answers
Let’s look at this part of your profiling output:
It is telling that, this query was triggered due to your aggregation pipeline.
Although you are using
change streams
, they internally implement a resumablecursor based functionality itself. The only difference being, the
cursor
, calls thegetMore
command"getMore": 8223354687588024000,
to fetch data in batches, at periodic intervals. Your batch size being 1000.Have a look Mongodb Community discussion, it should clear your concern.
Your cursors are being limited to 1000 results by
batchSize
. You can addbatchSize
as an optional parameter to thecollection.watch()
call:Since change streams are cursors, you can also apply a
limit()
to the cursor before retrieving documents:You may want to filter it to only events you care about (see below). Barring that, the defaults are the best practice.
The question of optimizing a MongoDB server in general is too board for this question, so I’ll keep this response limited to change streams as that appears to be the specific use case OP is asking about.
Starting in MongoDB 5.1, change streams are optimized, providing more efficient resource utilization and faster execution of some aggregation pipeline stages. If you aren’t already on a newer version, updating to 5.1 or newer will provide a performance boost.
You can look at the Change Streams Production Recommendations to see if you are complying with the official Mongo advice. The only relevant section to performance is this one:
So if you are using
collection.watch()
on a very active collection and you only need to act on certain changes, use$match
to filter down to just the changes you care about.For example, if you only care about items authored by "dave":
More considerations on batch size
As for your previous questions about batch size, reducing the batch size won’t really have an effect on performance (with one exception noted below), so if performance is your only concern, you’ll want to look elsewhere. From the docs:
There is one caveat, explained in great detail here: