skip to Main Content

Producer side

We are trying to ingest a huge amount of data to Azure Cloud. The data are coming from sensors. The rate is about 13 packages per second with a total amount of around 500 KiB/s per sensor. There will be 100’000s of sensors.

Consumer side

Then we have some consumer applications that need to retrieve this data (with lowest possible latency) and in the same order, the producer sent the data. Every application needs all of the data of one sensor. Per sensor, there will be 5-10 applications/consumers.

enter image description here

First approach

First we tried to solve this with EventHubs. That looks most promising in terms of queuing the data and distribute it to consumers. But during load tests, we figured out that there is a hard limit of having the data on one partition (what we need because the data need to be ordered at any time). With one partition and the given load, we could handle roughly 4-5 Consumers. With more than 5 consumers, things are getting laggy and the consumers could no more hold step with the producer side.

Obviously this is a limitation on the EventHub partition itself and has nothing to do with either the tier (Standard vs. Premium –> no difference) or the scaling units (increasing the scaling units did not made any difference).

The recommendation of Azure for more throughput is simple: Increase the number of partitions per EventHub. But with that we would loose the ordering of the events and the clients would be required to do it for themselves (which is not really achievable). So it seems to us that the EventHubs are not exactly what we need, or we do not use them properly.

Further thinkings

We then searched a lot for other services that could give us the required features. We had a look on ServiceBus (which we already use for microservice communication). But there, the limits of the amount of data seem to be even more restrictive.

Question

What would be the ideal solution for the mentioned problem? Is it possible to be solved with one service or do we need to combine several services (like EventHub –> Stream analytivs jobs –> ServiceBus)?

2

Answers


  1. Chosen as BEST ANSWER

    After discussing with Azure support, we decided to build a combined solution with Event Hubs and Azure Function Apps like this: enter image description here

    This is currently the best solution for us in terms of costs and complexity.


  2. Here’s what I would do as a proof of concept:

    -Work with Azure Event Hubs (with multiple partitions)

    -Setup Azure Stream analytics and query events in near real time using some windowing function. Then, I would sink data to Azure Service Bus Topic which subscribers would consume later.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search