Hello i have a redis database that contains facial embeddings of 100k+ people. All of these are stored in redis as key-value pairs. For example:
{
"embedding:angelina" : [128-D vector of angelina],
"embedding:emma" : [128-D vector of emma],
"embedding:dicaprio" : [128-D vector of dicaprio]
}
Now I am trying to compare a target-embedding with all of the embeddings in my dataset to find the best match. One way i am trying to do it is to retrieve all keys starting with embedding* expression first. Then, iterate over those embeddings and find the distance with the target-embedding. If the distance is less than the threshold, then we will append it to a list, and then choose the shortest distance from that list.
I dont know, but I have a feeling that this is not a best practice. I would be glad if someone could help me find a better approach?
Note: I know ElasticSearch is a great candidate for such tasks, but I need to stick with redis for now.
2
Answers
Iterating over Redis keys by pattern is possible, but it’s not a best practice. The Redis docs warn the following:
Using
SCAN
will protect the Redis instance’s resources, but it will still take you a long time and many requests to use SCAN to get all the keys in a large dataset.Some workarounds come to mind, depending on your situation:
angelina
in the value if it’s important.embedding:.*
key in the dataset, also useSADD
andSREM
to add or remove the key name in a Set (could be named e.g.embedding_sample_keys
). I haven’t tried this but it sounds pretty viable.hash
structure. (possibly the key would beembedding_data
). This has downsides like not being able to set a distinct TTL for each cache key. You can use HKEYS and HSCAN to access all the keys in a hash, which might be an improvement over scanning the entire dataset.This sounds like a good candidate for Redis VSS