skip to Main Content

I need to match tens of thousands of 4 byte strings with about one or more boolean values. I don’t mind using up a whole word for the booleans if it means faster retrieval. However I have such tight constraints for my data I imagine there is some, albeit minor, optimizations that can be made if these are reported to the storage engine in advance. Does Redis have any way to take advantage of this?

Here is a sample of my data:

"DENL": false
"NLES": false
"NLUS": true
"USNL": true
"AEGB": true
"ITAE": true
"ITFR": false

The keys are the concatination of two ISO 3166-1 alpha-2 codes. As such they are guaranteed to be 4 uppercase English letters.

The data structures I have considered using are:

  • Hashes to map the 4 byte keys to a string representing the booleans
  • A separate set for each boolean value

And since my data only contains uppercase English letters and there are only 456976 possible combinations of those (which comes out to 56KB per bit stored per key):

  • One or more strings that are accessed with bitwise operations (GETBIT, BITFIELD) using a function to convert the key string to a bit index.

I think that sets are probably the most elegant solution and a binary string over all possible combinations will be the most efficient. I would like to know wheter there is there some kind of middle ground? Like a set with fixed length strings as members. I would expect a datatype optimized for fixed length strings would be able to provide faster searching than one optimized for variable length strings.

2

Answers


  1. There are a couple of optimizations you could try:

    1. use a set and treat all values as either “need customs declaration” or “does not need a customs declaration” – depending which one has fewer values; then with SISMEMBER you can check if your key is in that set which gives you the correct answer,
    2. have a look at introduction to Redis data types, chapter “Bitmaps” – if you pre-define all of your keys in some array you can use SETBIT and GETBIT operations to store the flag “needs customs declaration” for given bit number (index in array).
    Login or Signup to reply.
  2. It is slightly better to use the 4-letter country-code-combination as a simple key, with an empty value.

    The set data-type is really a hash map where the keys are the element and are added to the hash map with NULL value. I wouldn’t use a set as this implies to hashes and two lookups into a hash map: the first for the set key in the database and the second for the hash internal to the set for the element.

    Use the existence of the key as either “need customs declaration” or “does not need a customs declaration” as Tomasz says.

    Using simple keys allows you to use the SET command with NX/XX conditions, which may be handy in your logic:

    • NX — Only set the key if it does not already exist.
    • XX — Only set the key if it already exists.

    Use EXISTS command instead of GET as it is slightly faster (no type checking, no value fetching).

    Another advantage of simple keys vs sets is to get the value of multiple keys at once using MGET:

    > MGET DENL NLES NLUS
    1) ""
    2) ""
    3) (nil)
    

    To be able to do complex queries, assuming these are rare and not optimized for performance, you can use SSCAN (if you go with sets) or KEYS (if you go with simple keys). However, if you go with simple keys you better use a dedicated database, see SELECT.

    To query for those with NL on the left side you would use:

    KEYS NL??
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search