I am utilizing change streams from documentDB to read timely sequenced events using lambda, event bridge to trigger event every 10min to invoke lambda and to archive the data to S3. Is there a way to scale the read from change stream using resume token and polling model? If a single lambda tries to read from change stream to archive then my process is falling way behind. As our application writes couple of millions during peak period my archival process is able to archive atmost 500k records to S3. Is there a way to scale this process? Running parallel lambda might not work as this will lead to racing condition.
Question posted in Amazon Web Sevices
The official Amazon Web Services documentation can be found here.
The official Amazon Web Services documentation can be found here.
3
Answers
can’t you use
step-functions
? your event bridge fires the lambda which is a step-function, then it can keep the state while archiving the records.I am not certain about documentDB, but I believe in MongoDB you can create a change stream with a filter. In this way, you can have multiple change streams, each acting on a portion (filter) of data. This allows multiple change streams to work concurrently on one cluster.
My 2 cents: Instead of a Lambda script, use monstache configured with multiple workers.