How do I optimize a MongoDB change stream?

Dev01
July 29, 2022
249 views
0 votes
2 Answers

I have some mongodb queries that are returning 1000 results, when I look in the mongodb.com profiler it shows me this:

{
  "command": {
    "getMore": 8223354687588024000,
    "collection": "reservations",
    "batchSize": 1000,
    "lsid": {
      "id": {
        "$binary": {
          "base64": "n8eH91eURw+xpT6fNEPURQ==",
          "subType": "04"
        }
      }
    },
    "$clusterTime": {
      "clusterTime": {
        "$timestamp": {
          "t": 1659066401,
          "i": 542
        }
      },
      "signature": {
        "hash": {
          "$binary": {
            "base64": "PHh4eHh4eD4=",
            "subType": "00"
          }
        },
        "keyId": 7090947493382324000
      }
    },
    "$db": "superhosttools"
  },
  "originatingCommand": {
    "aggregate": "reservations",
    "pipeline": [
      {
        "$changeStream": {
          "fullDocument": "updateLookup"
        }
      }
    ],
    "cursor": {},
    "lsid": {
      "id": {
        "$binary": {
          "base64": "n8eH91eURw+xpT6fNEPURQ==",
          "subType": "04"
        }
      }
    },
    "$clusterTime": {
      "clusterTime": {
        "$timestamp": {
          "t": 1659064839,
          "i": 29
        }
      },
      "signature": {
        "hash": {
          "$binary": {
            "base64": "PHh4eHh4eD4=",
            "subType": "00"
          }
        },
        "keyId": 7090947493382324000
      }
    },
    "$db": "superhosttools"
  },
  "planSummary": [
    {
      "COLLSCAN": {}
    }
  ],
  "cursorid": 8223354687588024000,
  "keysExamined": 0,
  "docsExamined": 26879,
  "numYields": 210,
  "nreturned": 1000,
  "reslen": 15283228,
  "locks": {
    "ReplicationStateTransition": {
      "acquireCount": {
        "w": 2206
      }
    },
    "Global": {
      "acquireCount": {
        "r": 2206
      }
    },
    "Database": {
      "acquireCount": {
        "r": 2206
      }
    },
    "Collection": {
      "acquireCount": {
        "r": 1994
      }
    },
    "Mutex": {
      "acquireCount": {
        "r": 1996
      }
    },
    "oplog": {
      "acquireCount": {
        "r": 211
      }
    }
  },
  "storage": {
    "data": {
      "bytesRead": 2083760,
      "timeReadingMicros": 4772
    }
  },
  "protocol": "op_msg",
  "millis": 249,
  "v": "4.2.21"
}

This looks like the interesting part and we use change streams but I don’t know why we would get 1000 results:

    "pipeline": [
      {
        "$changeStream": {
          "fullDocument": "updateLookup"
        }
      }
    ],

I’m trying to optimize my mongodb server. Any help how to make this query more efficient is appreciated.

Update #1

I removed the {"fullDocument": "updateLookup"} parameter from my watch code which seemed to help but I’m I’m still get some similar queries returning 1000 documents:

    "aggregate": "reservations",
    "pipeline": [
      {
        "$changeStream": {
          "fullDocument": "default"
        }
      }
    ],

I’m using now using the following code to implement the change stream:

Reservation.watch([]).on("change", async (change: ChangeEvent<ReservationDocument>) => {...});

I’m not wondering if I should add a query to the .watch([]) call to limit the number of documents? What is considered best practices with change streams?

Tags: mongodb node.js

Answers

- CharchitKapoor
- July 31, 2022 at 3:03 pm
- 0 votes
0
Let’s look at this part of your profiling output:
```
"originatingCommand": {
    "aggregate": "reservations",
    "pipeline": [
      {
        "$changeStream": {
          "fullDocument": "updateLookup"
        }
      }
    ],
    "cursor": {},
```
It is telling that, this query was triggered due to your aggregation pipeline.
Although you are using change streams, they internally implement a resumable
cursor based functionality itself. The only difference being, the cursor, calls the getMore command "getMore": 8223354687588024000, to fetch data in batches, at periodic intervals. Your batch size being 1000.

Have a look Mongodb Community discussion, it should clear your concern.
Login or Signup to reply.

- Salvatore
- August 2, 2022 at 8:08 pm
- 0 votes
0
I don’t know why we would get 1000 results

Your cursors are being limited to 1000 results by batchSize. You can add batchSize as an optional parameter to the collection.watch() call:
```
db.collection.watch([], {batchSize: <number>})
```
Since change streams are cursors, you can also apply a limit() to the cursor before retrieving documents:
```
db.collection.watch(pipeline, options).limit(<number>)
```
Should I add a query to the .watch([]) call to limit the number of documents? What is considered best practices with change streams?

You may want to filter it to only events you care about (see below). Barring that, the defaults are the best practice.

I’m trying to optimize my mongodb server. Any help how to make this query more efficient is appreciated.

The question of optimizing a MongoDB server in general is too board for this question, so I’ll keep this response limited to change streams as that appears to be the specific use case OP is asking about.

Starting in MongoDB 5.1, change streams are optimized, providing more efficient resource utilization and faster execution of some aggregation pipeline stages. If you aren’t already on a newer version, updating to 5.1 or newer will provide a performance boost.

You can look at the Change Streams Production Recommendations to see if you are complying with the official Mongo advice. The only relevant section to performance is this one:

If a sharded collection has high levels of activity, the mongos may not be able to keep up with the changes across all of the shards. Consider utilizing notification filters for these types of collections. For example, passing a $match pipeline configured to filter only insert operations.

So if you are using collection.watch() on a very active collection and you only need to act on certain changes, use $match to filter down to just the changes you care about.

For example, if you only care about items authored by "dave":
```
db.collection.watch(
    [ { $match : { author : "dave" } } ]
);
```
More considerations on batch size

As for your previous questions about batch size, reducing the batch size won’t really have an effect on performance (with one exception noted below), so if performance is your only concern, you’ll want to look elsewhere. From the docs:

Specifies the number of documents to return in each batch of the response from the MongoDB instance. In most cases, modifying the batch size will not affect the user or the application, as mongosh and most drivers return results as if MongoDB returned a single batch.

There is one caveat, explained in great detail here:

If you’re using MongoDB Change Streams and filtering for events that occur infrequently (compared to other activity within the oplog) resuming the change stream may appear “sluggish” using the defaults. Consider specifying a custom batchSize based on your workload to potentially improve the time to returning the first event.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How do I optimize a MongoDB change stream?

Update #1

Answers

More considerations on batch size