I want to run a poller on azure cosmos container which will run at regular intervals and fire a select query something like -:
SELECT * FROM c ORDER BY c._ts DESC LIMIT 100
After selecting the rows, I will do my operations in the application. The problem is that if I need to horizontally scale this poller to lets say 2 instances in future, and both of them run at the same time, how do I prevent them to pick same set of rows?
Also, I am not leveraging change feed processor because for change feed processor to scale horizontally, we need more physical partitions. I need to scale horizontally even within one physical partition.
I can implement optimistic locking but I do not want to select and then reject rows. Is there any way to pick different set of rows while select operation itself?
2
Answers
Have a single process read the data and distribute the rows over a horizontal scalable processor. That single process could be a timer triggered Azure Function with a cosmos db input binding or your own application. My point is: you should not scale the poller but the processing.
Some options:
If your concern is scaling the Change Feed of Cosmos DB, then rather than building something custom or polling that will consume RUs and need a timer infrastructure, why don’t you listen to Change Feed and add that as Message into Service Bus or Event Grid Topic. Then have another Azure Function or Logic App or your preferred compute listen to Service Bus Queue and Process the messages.
Also, you can have different partition schema than your Cosmos in Service Bus Queue that will allow you to run Multiple Brokers. Also, you can create different Subscriptions in Event Grid Topic or use Topics in Service Bus to create a Priority Queue and process certain changes on high priority, if your application demands it.