skip to Main Content

I have a high read traffic(order of 50,000 QPS) gRPC service with Redis database which we connect with Lettuce library. We update Redis from a Kafka topic, where for String keys, we append Long integers to a Redis Set as the data comes in.

When a request is made to the service with a String key, we we want to pull all members of the Set for that key within minimal latency as our service needs to return response in around 10ms. The size of a Set can vary between 100 to 100,000. Between String and byte array encoding, what would be a faster way to fetch these Long integers in terms of faster serialization/deserialization?

Using String: I convert the integers to Strings using Long.toString and push them to the Redis set and later use Long.parseLong to convert the String back to Long integer.

Using Byte Array: Using protocol buffers(Protobuf), I create a Long int64 Protobuf message. I convert the Long integer to this message, convert it using toByteArray() and push it to the Redis set and later for read, parse the byte array to get that message and then the Long integer from it.

Are there any other faster ways, like using Lua script to fetch all Set elements faster?

2

Answers


  1. I would take a step back and try to consider the entire system rather than fixating on serialization/deserialization. What is the actual bottleneck? What are you doing at that point? Are you querying individual elements inside redis? (in which case, latency is your likely limit, and you can’t change that without changing how everything interconnects) or are you fetching all the values out / inserting bulk values? In that scenario then bandwidth is more likely your limit, but if you’re doing that frequently: you probably aren’t using the redis "set" as a "set", and other options might be beneficial.

    You also don’t specify your redis library/usage: if you’re doing a large number of operations, "pipelining" is your friend to avoid per-operation latency costs. Without seeing exactly what you’re doing, it is hard to know whether you’re currently paying this cost, and whether this is in fact: the problem. Even if pipelining isn’t available in your client, batching via things like Lua might be – sending 50 (say) elements per round trip can make a huge difference in overall cost (pipelining would be better).

    The difference in ASCII vs protobuf vs CPU (fixed width / endianness) encoding can matter in some scenarios, but in this specific case: the redis protocol itself has overheads per element that are going to make that moot for these small values (unless those overheads are amortized, for example by encoding multiple values in a single payload).

    For context: I am intimately familiar with both redis and protobuf, and work in network comms on a daily basis.

    Login or Signup to reply.
  2. Redis sets are very powerful for server-side (redis) operations, but if you find yourself frequently fetching all the contents from a set: something has gone very wrong. In this case, the set element values are small, which means you will be paying a relatively large amount of protocol overhead, deframe overhead, parse overhead, etc – just to deal with how redis expresses sequences.

    However! Let’s imagine we held the N items in a redis set (for the redis unique add capability), but also in a redis binary string, of 8xN bytes – just the raw bytes of integers slammed one after the other. We could post-process the sets into the string, or we could use Lua to add the elements, for example:

    if redis.call('SADD', KEYS[1], ARGV[1]) == 1 then
        redis.call('APPEND', KEYS[2], ARGV[2])
        return true
    end
    return false
    

    With the data as a redis string, we can do a single GET; this is much simpler than SMEMBERS – and we pay virtually no protocol overhead. Equally, the receiving library doesn’t have to parse the incoming fragments into a typed array – most libraries have simple primitives for raw byte buffers. If we are overly concerned about very large fetches, we could use SUBSTR in a loop to read chunks, rather than fetching everything at once. And depending on your platform capabilities: once we have a raw binary buffer, we might be able to use "punning" (think "reinterpret cast") rather than any kind of deserialize/parse step, so that we don’t have to process the results in any way: we just take them as we see them.

    Now the question becomes: is this faster? Well, the easiest way to answer such a question is to test it, so: let’s do that!

    I’m not a Java person – I’m a C# head, but we’re not testing the platform here – we’re trying to answer "could this help?", so: let’s not worry about that. Long story short (too late!), the answer is "yes". Here’s my benchmark results using 50,000 elements:

    BenchmarkDotNet v0.13.12, Windows 11 (10.0.26058.1100)
    AMD Ryzen 9 5900HS with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
    .NET SDK 8.0.100
      [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
      DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
    
    
    | Method     | Mean        | Error     | StdDev      | Ratio |
    |----------- |------------:|----------:|------------:|------:|
    | GetMembers | 36,127.2 us | 720.40 us | 1,596.36 us | 1.000 |
    | GetString  |    245.6 us |   1.61 us |     1.50 us | 0.007 |
    

    The SMEMBERS approach here is 147 times more expensive.

    With a sum test (to check they’re seeing the same data):

    SMEMBERS:       -1885294786466967137
    GET:            -1885294786466967137
    

    (the numbers will be different each run, but should match each-other)

    Finally, here’s my entire test suite, in case I’ve done something obviously wrong:

    
    using BenchmarkDotNet.Attributes;
    using BenchmarkDotNet.Running;
    using StackExchange.Redis;
    using System.Runtime.InteropServices;
    
    #if DEBUG
    // simple test
    var obj = new RedisBenchmark();
    obj.Init();
    Console.WriteLine($"SMEMBERS:t{obj.GetMembers()}");
    Console.WriteLine($"GET:tt{obj.GetString()}");
    #else
    // bench run
    BenchmarkRunner.Run<RedisBenchmark>();
    #endif
    
    public class RedisBenchmark
    {
        private ConnectionMultiplexer muxer = ConnectionMultiplexer.Connect("127.0.0.1:6379");
        private static RedisKey KeySet = "foo/set", KeyString = "foo/str";
        private IDatabase db;
    
        [GlobalSetup]
        public void Init()
        {
            if (!BitConverter.IsLittleEndian)
            {
                throw new PlatformNotSupportedException("Big-endian path not implemented");
            }
    
            // reset and load with test data
            db = muxer.GetDatabase();
            db.KeyDelete(KeySet);
            db.KeyDelete(KeyString);
            int count = 50000;
            var rand = new Random();
            byte[] buffer = new byte[8];
            ref long punned = ref MemoryMarshal.Cast<byte, long>(buffer)[0];
            RedisKey[] keys = [KeySet, KeyString];
            RedisValue[] argv = [0L, buffer];
            while (count > 0)
            {
                var value = rand.NextInt64();
                punned = value; // this actually writes to "buffer", thanks to ref-local
    
                /*
                // this is the simple version, but this has race conditions; you
                // wouldn't do this in a per-add way, but you might post-process
                // a set to create a string
                if (db.SetAdd(KeySet, value))
                {
                    db.StringAppend(KeyString, buffer);
                    count--;
                }
                */
    
                // but the fancy way is to get Lua to do the append, which works
                // in a thread-safe way at the server; we can do this per-add safely;
                // string.pack("<i8", ...) didn't work inside redis, so I'm passing
                // the bytes down, but: that's fine
                argv[0] = value;
                var added = (bool)db.ScriptEvaluate("""
                    if redis.call('SADD', KEYS[1], ARGV[1]) == 1 then
                        redis.call('APPEND', KEYS[2], ARGV[2])
                        return true
                    end
                    return false
                    """, keys, argv);
    
                if (added) count--;
            }
        }
    
    
        [Benchmark(Baseline = true)]
        public long GetMembers()
        {
            long sum = 0;
            foreach (var value in db.SetMembers(KeySet))
            {
                sum += (long)value;
            }
            return sum;
        }
    
        [Benchmark]
        public long GetString()
        {
            using var lease = db.StringGetLease(KeyString);
            if (lease is null) return 0;
    
            var punned = MemoryMarshal.Cast<byte, long>(lease.Span);
            long sum = 0;
            foreach (var value in punned)
            {
                sum += value;
            }
            return sum;
        }
    }
    

    I can’t guarantee that you’ll have the exact same results in Java with Lettuce, but: this is interesting enough to at least make it worth trying to port the experiment to find out.


    Since you mention gRPC, it is also worth noting that if your schema defines repeated int64 values = ... for the values, there is also a sneaky way to present this from a raw buffer using "packed" encoding, so: zero translation again, all the way from redis to gRPC

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search