skip to Main Content

I’m trying to power some multi-selection query & filter operations with SCAN operations on my data and I’m not sure if I’m heading in the right direction.

I am using AWS ElastiCache (Redis 5.0.6).

Key design: <recipe id>:<recipe name>:<recipe type>:<country of origin>

Example:

13434:Guacamole:Dip:Mexico
34244:Gazpacho:Soup:Spain
42344:Paella:Dish:Spain
23444:HotDog:StreetFood:USA
78687:CustardPie:Dessert:Portugal
75453:Churritos:Dessert:Spain

If I want to power queries with complex multi-selection filters (example to return all keys matching five recipe types from two different countries) which the SCAN glob-style match pattern can’t handle, what is the common way to go about it for a production scenario?

Assuming the I will calculate all possible patterns by doing a cartesian product of all field alternating patterns and multi-field filters:

[[Guacamole, Gazpacho], [Soup, Dish, Dessert], [Portugal]] *:Guacamole:Soup:Portugal
*:Guacamole:Dish:Portugal
*:Guacamole:Dessert:Portugal
*:Gazpacho:Soup:Portugal
*:Gazpacho:Dish:Portugal
*:Gazpacho:Dessert:Portugal

What mechanism should I use to implement this sort of pattern matching in Redis?

  1. Do multiple SCAN for each scannable pattern sequentially and merge the results?
  2. LUA script to use improved pattern matching for each pattern while scanning keys and get all matching keys in a single SCAN?
  3. An index built on top of sorted sets supporting fast lookups of keys matching single fields and solve matching alternation in the same field with ZUNIONSTORE and solve intersection of different fields with ZINTERSTORE?

<recipe name>:: => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
::<country of origin> => key1, key2, keyN

  1. An index built on top of sorted sets supporting fast lookups of keys matching all dimensional combinations and therefore avoiding Unions and Intersecions but wasting more storage and extend my index keyspace footprint?

<recipe name>:: => key1, key2, keyN
<recipe name>:<recipe type>: => key1, key2, keyN
<recipe name>::<country of origin> => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
:<recipe type>:<country of origin> => key1, key2, keyN
::<country of origin> => key1, key2, keyN

  1. Leverage RedisSearch? (while impossible for my use case, see Tug Grall answer which appears to be very nice solution.)
  2. Other?

I’ve implemented 1) and performance is awful.

private static HashSet<String> redisScan(Jedis jedis, String pattern, int scanLimitSize) {

    ScanParams params = new ScanParams().count(scanLimitSize).match(pattern);

    ScanResult<String> scanResult;
    List<String> keys;
    String nextCursor = "0";
    HashSet<String> allMatchedKeys = new HashSet<>();

    do {
        scanResult = jedis.scan(nextCursor, params);
        keys = scanResult.getResult();
        allMatchedKeys.addAll(keys);
        nextCursor = scanResult.getCursor();
    } while (!nextCursor.equals("0"));

    return allMatchedKeys;

}

private static HashSet<String> redisMultiScan(Jedis jedis, ArrayList<String> patternList, int scanLimitSize) {

    HashSet<String> mergedHashSet = new HashSet<>();
    for (String pattern : patternList)
        mergedHashSet.addAll(redisScan(jedis, pattern, scanLimitSize));

    return mergedHashSet;
}

For 2) I’ve created a Lua Script to help with the server-side SCAN and the performance is not brilliant but is much faster than 1) even taking into consideration that Lua doesn’t support alternation matching patterns and I have to loop each key through a pattern list for validation:

local function MatchAny( str, pats )
    for pat in string.gmatch(pats, '([^|]+)') do
        local w = string.match( str, pat )
        if w then return w end
    end
end

-- ARGV[1]: Scan Count
-- ARGV[2]: Scan Match Glob-Pattern
-- ARGV[3]: Patterns

local cur = 0
local rep = {}
local tmp

repeat
  tmp = redis.call("SCAN", cur, "MATCH", ARGV[2], "count", ARGV[1])
  cur = tonumber(tmp[1])
  if tmp[2] then
    for k, v in pairs(tmp[2]) do
      local fi = MatchAny(v, ARGV[3])
      if (fi) then
        rep[#rep+1] = v
      end
    end
  end
until cur == 0
return rep

Called in such a fashion:

private static ArrayList<String> redisLuaMultiScan(Jedis jedis, String luaSha, List<String> KEYS, List<String> ARGV) {
    Object response = jedis.evalsha(luaSha, KEYS, ARGV);
    if(response instanceof List<?>)
        return (ArrayList<String>) response;
    else
        return new ArrayList<>();
}  

For 3) I’ve implemented and maintained a secondary Index updated for each of the 3 fields using Sorted Sets and implemented querying with alternating matching patterns on single fields and multi-field matching patterns like this:

private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {

    ArrayList<String> unionedSets = new ArrayList<>();
    String keyName;
    Pipeline pipeline = jedis.pipelined();

    for (ArrayList<String> subPatternList : patternList) {
        if (subPatternList.isEmpty()) continue;
        keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
        pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
        unionedSets.add(keyName);
    }

    String[] unionArray = unionedSets.toArray(new String[0]);
    keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
    pipeline.zinterstore(keyName, unionArray);
    Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
    pipeline.del(unionArray);
    pipeline.del(keyName);
    pipeline.sync();

    return response.get();
}

The results of my stress test cases clearly favor 3) in terms of request latency:

enter image description here

2

Answers


  1. Chosen as BEST ANSWER

    I ended up using a simple strategy to update each secondary index for each field when the key is created:

    protected static void setKeyAndUpdateIndexes(Jedis jedis, String key, String value, int idxDimSize) {
        String[] key_arr = key.split(":");
        Pipeline pipeline = jedis.pipelined();
    
        pipeline.set(key, value);
        for (int y = 0; y < key_arr.length; y++)
            pipeline.zadd(
                    "idx:" +
                        StringUtils.repeat(":", y) +
                        key_arr[y] +
                        StringUtils.repeat(":", idxDimSize-y),
                    java.time.Instant.now().getEpochSecond(),
                    key);
    
        pipeline.sync();
    }
    

    The search strategy to find multiple keys that match a pattern including alternating patterns and multi-field filters was achieved like:

    private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {
    
        ArrayList<String> unionedSets = new ArrayList<>();
        String keyName;
        Pipeline pipeline = jedis.pipelined();
    
        for (ArrayList<String> subPatternList : patternList) {
            if (subPatternList.isEmpty()) continue;
            keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
            pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
            unionedSets.add(keyName);
        }
    
        String[] unionArray = unionedSets.toArray(new String[0]);
        keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
        pipeline.zinterstore(keyName, unionArray);
        Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
        pipeline.del(unionArray);
        pipeline.del(keyName);
        pipeline.sync();
    
        return response.get();
    }
    

  2. I would vote for option 3, but I will probably start to use RediSearch.

    Also have you look at RediSearch? This module allows you to create secondary index and do complex queries and full text search.

    This may simplify your development.

    I invite you to look at the project and Getting Started.

    Once installed you will be able to achieve it with the following commands:

    
    HSET recipe:13434 name "Guacamole" type "Dip" country "Mexico" 
    
    HSET recipe:34244 name "Gazpacho" type "Soup" country "Spain"
    
    HSET recipe:42344 name "Paella"  type "Dish" country "Spain"
    
    HSET recipe:23444 name "Hot Dog"  type "StreetFood" country "USA"
    
    HSET recipe:78687  name "Custard Pie"  type  "Dessert" country "Portugal"
    
    HSET recipe:75453  name "Churritos" type "Dessert" country "Spain"
    
    FT.CREATE idx:recipe ON HASH PREFIX 1 recipe: SCHEMA name TEXT SORTABLE type TAG SORTABLE country TAG SORTABLE
    
    FT.SEARCH idx:recipe "@type:{Dessert}"
    
    FT.SEARCH idx:recipe "@type:{Dessert} @country:{Spain}" RETURN 1 name
    
    FT.AGGREGATE idx:recipe "*" GROUPBY 1 @type REDUCE COUNT 0 as nb_of_recipe
    
    

    I am not explaining all the commands in details here since you can find the explanation in the tutorial but here are the basics:

    • use a hash to store the recipes
    • create a RediSearch index and index the fields you want to query
    • Run queries, for example:
      • To get all Spanish Desert: FT.SEARCH idx:recipe "@type:{Dessert} @country:{Spain}" RETURN 1 name
      • To count the number of recipe by type: FT.AGGREGATE idx:recipe "*" GROUPBY 1 @type REDUCE COUNT 0 as nb_of_recipe
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search