skip to Main Content

my question is rather specific, so I will be ok with a general answer, which will point me in the right direction.

Description of the problem:
I want to deliver specific task data from multiple producers to a particular consumer working on the task (both are docker containers run in k8s). The relation is many to many – any producer can create a data packet for any consumer. Each consumer is processing ~10 streams of data at any given moment, while each data stream consists of 100 of 160b messages per second (from different producers).

Current solution:
In our current solution, each producer has a cache of a task: (IP: PORT) pair values for consumers and uses UDP data packets to send the data directly. It is nicely scalable but rather messy in deployment.

Question:
Could this be realized in the form of a message queue of sorts (Kafka, Redis, rabbitMQ…)? E.g., having a channel for each task where producers send data while consumer – well consumes them? How many streams would be feasible to handle for the MQ (i know it would differ – suggest your best).

Edit: Would 1000 streams which equal 100 000 messages per second be feasible? (troughput for 1000 streams is 16 Mb/s)

Edit 2: Fixed packed size to 160b (typo)

2

Answers


  1. A basic approach

    As from Kafka perspective, each stream in your problem can map to one topic in Kafka and
    therefore there is one producer-consumer pair per topic.

    Con: If you have lots of streams, you will end up with lot of topics and IMO the solution can get messier here too as you are increasing the no. of topics.


    An alternative approach

    Alternatively, the best way is to map multiple streams to one topic where each stream is separated by a key (like you use IP:Port combination) and then have multiple consumers each subscribing to a specific set of partition(s) as determined by the key. Partitions are the point of scalability in Kafka.

    Con: Though you can increase the no. of partitions, you cannot decrease them.


    Type of data matters

    If your streams are heterogeneous, in the sense that it would not be apt for all of them to share a common topic, you can create more topics.

    Usually, topics are determined by the data they host and/or what their consumers do with the data in the topic. If all of your consumers do the same thing i.e. have the same processing logic, it is reasonable to go for one topic with multiple partitions.

    Some points to consider:

    Unlike in your current solution (I suppose), once the message is received, it doesn’t get lost once it is received and processed, rather it continues to stay in the topic till the configured retention period.

    Take proper care in determining the keying strategy i.e. which messages land in which partitions. As said, earlier, if all of your consumers do the same thing, all of them can be in a consumer group to share the workload.

    Consumers belonging to the same group do a common task and will subscribe to a set of partitions determined by the partition assignor. Each consumer will then get a set of keys in other words, set of streams or as per your current solution, a set of one or more IP:Port pairs.

    Login or Signup to reply.
  2. Unless you need disk persistence, do not even look in message broker direction. You are just adding one problem to an other. Direct network code is a proper way to solve audio broadcast. Now if your code is messy and if you want a simplified programming model good alternative to sockets is a ZeroMQ library. This will give you all MessageBroker functionality for which you care: a) discrete messaging instead of streams, b) client discoverability; without going overboard with another software layer.

    When it comes to "feasible": 100 000 messages per second with 160kb message is a lot of data and it comes to 1.6 Gb/sec even without any messaging protocol on top of it. In general Kafka shines at message throughput of small messages as it batches messages on many layers. Knowing this sustained performances of Kafka are usually constrained by disk speed, as Kafka is intentionally written this way (slowest component is disk). However your messages are very large and you need to both write and read messages at same time so I don’t see it happen without large cluster installation as your problem is actual data throughput, and not number of messages.

    Because you are data limited, even other classic MQ software like ActiveMQ, IBM MQ etc is actually able to cope very well with your situation. In general classic brokers are much more "chatty" than Kafka and are not able to hit message troughpout of Kafka when handling small messages. But as long as you are using large non-persistent messages (and proper broker configuration) you can expect decent performances in mb/sec from those too. Classic brokers will, with proper configuration, directly connect a socket of producer to a socket of a consumer without hitting a disk. In contrast Kafka will always persist to disk first. So they even have some latency pluses over Kafka.

    However this direct socket-to-socket "optimisation" is just a full circle turn to the start of an this answer. Unless you need audio stream persistence, all you are doing with a broker-in-the-middle is finding an indirect way of binding producing sockets to consuming ones and then sending discrete messages over this connection. If that is all you need – ZeroMQ is made for this.

    There is also messaging protocol called MQTT which may be something of interest to you if you choose to pursue a broker solution. As it is meant to be extremely scalable solution with low overhead.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search