skip to Main Content

I am using Redis as my cache server. For clarity, I am storing key-value pairs like 'S0007226_2005-07-09': '[15.3462, -1]'. The queries are about specific keys and not range based. For querying, I am using pyredis client.

I frequently have to MGET ~1 Million keys from the cache. This kind of query is too heavy for redis and takes upto 10 seconds. The catch here is that MGET for n keys in query is an O(n) operation (n being number of keys in the query). I have added the table for query time from logs.

| Keys   | time(ms)|
| 703732 | 6869.66 |
| 26806  | 277.21  |
| 13180  | 137.41  |
| 400    | 5.83    |
| 2589   | 29.04   |
| 180    | 3.6     |
| 98413  | 1009.84 |
| 151994 | 1524.12 |

This seems very normal as with the increasing number of keys, the time is increasing in O(n) way. Also, I am using redis pipeline breaking the list of keys in chunks of 10K.

I want to reduce the query time to ~1s or less. If it was not Redis, i could have tried to request in parallel and merge the results. But given that redis can only work on single core, that is not a viable option in my understanding. The possible way out:

  1. Go for some design change where I don’t have to query a Million keys in the first place.
  2. Use some other tool instead of Redis to handle the load.
  3. Some optimisation in the present setup itself to handle it better.

Suppose I have to choose something out of 2 & 3. What are my options. Shall I try some other caching server which is designed for higher throughput or is there some optimisation that I can do, either in the query/ storage or in the setup to get better results.

2

Answers


  1. "If it was not Redis, i could have tried to request in parallel and merge the results."

    You can still request in parallel. Create multi-master setup and shard / distribute your keys across multiple masters. You can then request data in parallel from multiple masters.

    I can also tell you from experience that there’s nothing faster than redis, as it is totally in-memory, single threaded process. So #2 in your question is highly unlikely.

    I would rather change the design, i.e. #1. If not then do a multi-master setup and request in parallel.

    Login or Signup to reply.
  2. I don’t think you should be querying 1Mn keys and at the same time. You should build a cache with in-memory cache and Redis cache.

    You should query like:

    • Search in the local cache
    • Query Redis only for keys those are not available.

    Always use TTL, TTL will help you to distribute the key queries over time, if you think many keys might expire at the same time then add a random delta to TTL.

    Even after doing this if you see a performance issue with single node Redis than use master-replica. Given the number of keys you have, you need to have more than 10 shards or so.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search