I have a high read traffic(order of 50,000 QPS) gRPC service with Redis database which we connect with Lettuce library. We update Redis from a Kafka topic, where for String keys, we append Long integers to a Redis Set as the data comes in.
When a request is made to the service with a String key, we we want to pull all members of the Set for that key within minimal latency as our service needs to return response in around 10ms. The size of a Set can vary between 100 to 100,000. Between String and byte array encoding, what would be a faster way to fetch these Long integers in terms of faster serialization/deserialization?
Using String: I convert the integers to Strings using Long.toString
and push them to the Redis set and later use Long.parseLong
to convert the String back to Long integer.
Using Byte Array: Using protocol buffers(Protobuf), I create a Long int64 Protobuf message. I convert the Long integer to this message, convert it using toByteArray()
and push it to the Redis set and later for read, parse the byte array to get that message and then the Long integer from it.
Are there any other faster ways, like using Lua script to fetch all Set elements faster?
2
Answers
I would take a step back and try to consider the entire system rather than fixating on serialization/deserialization. What is the actual bottleneck? What are you doing at that point? Are you querying individual elements inside redis? (in which case, latency is your likely limit, and you can’t change that without changing how everything interconnects) or are you fetching all the values out / inserting bulk values? In that scenario then bandwidth is more likely your limit, but if you’re doing that frequently: you probably aren’t using the redis "set" as a "set", and other options might be beneficial.
You also don’t specify your redis library/usage: if you’re doing a large number of operations, "pipelining" is your friend to avoid per-operation latency costs. Without seeing exactly what you’re doing, it is hard to know whether you’re currently paying this cost, and whether this is in fact: the problem. Even if pipelining isn’t available in your client, batching via things like Lua might be – sending 50 (say) elements per round trip can make a huge difference in overall cost (pipelining would be better).
The difference in ASCII vs protobuf vs CPU (fixed width / endianness) encoding can matter in some scenarios, but in this specific case: the redis protocol itself has overheads per element that are going to make that moot for these small values (unless those overheads are amortized, for example by encoding multiple values in a single payload).
For context: I am intimately familiar with both redis and protobuf, and work in network comms on a daily basis.
Redis sets are very powerful for server-side (redis) operations, but if you find yourself frequently fetching all the contents from a set: something has gone very wrong. In this case, the set element values are small, which means you will be paying a relatively large amount of protocol overhead, deframe overhead, parse overhead, etc – just to deal with how redis expresses sequences.
However! Let’s imagine we held the N items in a redis set (for the redis unique add capability), but also in a redis binary string, of 8xN bytes – just the raw bytes of integers slammed one after the other. We could post-process the sets into the string, or we could use Lua to add the elements, for example:
With the data as a redis string, we can do a single
GET
; this is much simpler thanSMEMBERS
– and we pay virtually no protocol overhead. Equally, the receiving library doesn’t have to parse the incoming fragments into a typed array – most libraries have simple primitives for raw byte buffers. If we are overly concerned about very large fetches, we could useSUBSTR
in a loop to read chunks, rather than fetching everything at once. And depending on your platform capabilities: once we have a raw binary buffer, we might be able to use "punning" (think "reinterpret cast") rather than any kind of deserialize/parse step, so that we don’t have to process the results in any way: we just take them as we see them.Now the question becomes: is this faster? Well, the easiest way to answer such a question is to test it, so: let’s do that!
I’m not a Java person – I’m a C# head, but we’re not testing the platform here – we’re trying to answer "could this help?", so: let’s not worry about that. Long story short (too late!), the answer is "yes". Here’s my benchmark results using 50,000 elements:
The
SMEMBERS
approach here is 147 times more expensive.With a sum test (to check they’re seeing the same data):
(the numbers will be different each run, but should match each-other)
Finally, here’s my entire test suite, in case I’ve done something obviously wrong:
I can’t guarantee that you’ll have the exact same results in Java with Lettuce, but: this is interesting enough to at least make it worth trying to port the experiment to find out.
Since you mention gRPC, it is also worth noting that if your schema defines
repeated int64 values = ...
for the values, there is also a sneaky way to present this from a raw buffer using "packed" encoding, so: zero translation again, all the way from redis to gRPC