I am using Redis as my cache server. For clarity, I am storing key-value pairs like 'S0007226_2005-07-09': '[15.3462, -1]'
. The queries are about specific keys and not range based. For querying, I am using pyredis client.
I frequently have to MGET ~1 Million keys from the cache. This kind of query is too heavy for redis and takes upto 10 seconds. The catch here is that MGET for n keys in query is an O(n) operation (n being number of keys in the query). I have added the table for query time from logs.
| Keys | time(ms)|
| 703732 | 6869.66 |
| 26806 | 277.21 |
| 13180 | 137.41 |
| 400 | 5.83 |
| 2589 | 29.04 |
| 180 | 3.6 |
| 98413 | 1009.84 |
| 151994 | 1524.12 |
This seems very normal as with the increasing number of keys, the time is increasing in O(n) way. Also, I am using redis pipeline breaking the list of keys in chunks of 10K.
I want to reduce the query time to ~1s or less. If it was not Redis, i could have tried to request in parallel and merge the results. But given that redis can only work on single core, that is not a viable option in my understanding. The possible way out:
- Go for some design change where I don’t have to query a Million keys in the first place.
- Use some other tool instead of Redis to handle the load.
- Some optimisation in the present setup itself to handle it better.
Suppose I have to choose something out of 2 & 3. What are my options. Shall I try some other caching server which is designed for higher throughput or is there some optimisation that I can do, either in the query/ storage or in the setup to get better results.
2
Answers
"If it was not Redis, i could have tried to request in parallel and merge the results."
You can still request in parallel. Create multi-master setup and shard / distribute your keys across multiple masters. You can then request data in parallel from multiple masters.
I can also tell you from experience that there’s nothing faster than redis, as it is totally in-memory, single threaded process. So #2 in your question is highly unlikely.
I would rather change the design, i.e. #1. If not then do a multi-master setup and request in parallel.
I don’t think you should be querying 1Mn keys and at the same time. You should build a cache with in-memory cache and Redis cache.
You should query like:
Always use TTL, TTL will help you to distribute the key queries over time, if you think many keys might expire at the same time then add a random delta to TTL.
Even after doing this if you see a performance issue with single node Redis than use master-replica. Given the number of keys you have, you need to have more than 10 shards or so.