I have a requirement to build a real-time-low-latency messaging system.
I read up ZeroMQ guide, for my requirement ROUTER-DEALER
pattern matches perfectly. I built the system around it and it is working fine. But when I am doing performance testing I found ZeroMQ latency is staggeringly high.
libzmq version : 4.2.5
./local_lat tcp://127.0.0.1:5555 1 1000
./remote_lat tcp://127.0.0.1:5555 1 1000
message size: 1 [B]
roundtrip count: 1000
average latency: 26.123 [us]
When the message count is high, ZeroMQ performance is awesome, but when the same message count is 1 the latency is quite high.
./local_lat tcp://127.0.0.1:5555 1 1
./remote_lat tcp://127.0.0.1:5555 1 1
message size: 1 [B]
roundtrip count: 1
average latency: 506.500 [us]
Still not convienced I took the PUB-SUB
example from ZeroMQ guide modified to ROUTER-DEALER
and added a timestamp and tested again. There is also the latency really very high. This makes ZeroMQ unusable. I agree when the message volumes are high there is nothing which will beat ZeroMQ, but for the systems where the latency is critical even for single message ZeroMQ fails.
Note : I ran the PUB-SUB
example also, it is showing me same latency figures
sender.cpp
#include <zmq.hpp>
#include <stdlib.h>
#include <unistd.h>
#include <iostream>
#include <sys/time.h>
int main () {
// Prepare our context and publisher
zmq::context_t context (1);
zmq::socket_t publisher (context, ZMQ_ROUTER);
publisher.bind("tcp://*:5556");
//wait for peer to connect
//once connected store the identity to send message later
zmq::message_t identity,m;
publisher.recv(&identity);//identity
publisher.recv(&m);//message
sleep(1);//sleep to set up connections
struct timeval timeofday;
int i=1;//no of messages to send
while (i) {
zmq::message_t id,message("10101",5);
id.copy(&identity);
gettimeofday(&timeofday,NULL);
publisher.send(id,ZMQ_SNDMORE);
publisher.send(message);
std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
usleep(1);
--i;
}
return 0;
}
reciever.cpp
#include <zmq.hpp>
#include <iostream>
#include <sys/time.h>
int main (int argc, char *argv[])
{
zmq::context_t context (1);
zmq::socket_t subscriber (context, ZMQ_DEALER);
subscriber.setsockopt(ZMQ_IDENTITY,"1",1);
subscriber.connect("tcp://localhost:5556");
struct timeval timeofday;
subscriber.send(" ",1);
int update_nbr;
int i=1;
for (update_nbr = 0; update_nbr < i; update_nbr++) {
zmq::message_t update;
subscriber.recv(&update);
gettimeofday(&timeofday,NULL);
std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
}
return 0;
}
sender output
1562908600, 842072
reciever output
1562908600, 842533
As you can see it takes about 400 ms.
Is there any way to reduce the latency?
2
Answers
The overhead of setting up a TCP connection is not trivial – and the overhead increases for ZMQ, because the action of creating a connection is accomplished, behind the scenes, with some TCP back-and-forth.
So, the latency of your first message is not just the latency of the message itself. It’s the latency of the message combined with the latency of the messages involved in setting up the connection.
The other thing to note is that 1 byte is a staggeringly inefficient payload, even with vanilla TCP. If ZMQ is doing any sort of batching when it sends, that’ll dramatically improve latency of a large number of messages. I don’t know if it does any of that under the hood or not, but the fact remains that 1 byte is saddled with the same unavoidable overhead involved with both TCP and ZMQ to establish the connection as 1000 bytes is, it just overwhelms the latency involved with sending the 1 byte and is negligible for 1000 bytes.
I recommend you send 2 messages and watch the individual latency for each message – if I’m right, the 2nd one should be back down into the ~25µs range. If that’s the case, then it’s up to you whether that overhead represents a problem or not. I suspect it won’t in practice.
ZMQ does not send messages serially, ie there is not a 1:1 relationship between zmq_send() and the send() function on the underlying TCP socket. As a message queue it gives the buffer the chance to fill at least some between actual sends over TCP. This delay is very minimal, but effective enough to allow many msgs to be sent at once, potentially utilizing the whole bandwidth available between two points. If I’m not mistaken ZMQ’s IO will automatically tune itself to maximize throughput while maintaining as little latency as possible if the buffer isn’t filling between sends. So true latency may be as high as 100 ms or even more if over TCP, but is actually sending thousands of messages at once if the queue is being fed fast enough.