I need to match tens of thousands of 4 byte strings with about one or more boolean values. I don’t mind using up a whole word for the booleans if it means faster retrieval. However I have such tight constraints for my data I imagine there is some, albeit minor, optimizations that can be made if these are reported to the storage engine in advance. Does Redis have any way to take advantage of this?
Here is a sample of my data:
"DENL": false
"NLES": false
"NLUS": true
"USNL": true
"AEGB": true
"ITAE": true
"ITFR": false
The keys are the concatination of two ISO 3166-1 alpha-2 codes. As such they are guaranteed to be 4 uppercase English letters.
The data structures I have considered using are:
- Hashes to map the 4 byte keys to a string representing the booleans
- A separate set for each boolean value
And since my data only contains uppercase English letters and there are only 456976 possible combinations of those (which comes out to 56KB per bit stored per key):
- One or more strings that are accessed with bitwise operations (GETBIT, BITFIELD) using a function to convert the key string to a bit index.
I think that sets are probably the most elegant solution and a binary string over all possible combinations will be the most efficient. I would like to know wheter there is there some kind of middle ground? Like a set with fixed length strings as members. I would expect a datatype optimized for fixed length strings would be able to provide faster searching than one optimized for variable length strings.
2
Answers
There are a couple of optimizations you could try:
It is slightly better to use the 4-letter country-code-combination as a simple key, with an empty value.
The set data-type is really a hash map where the keys are the element and are added to the hash map with NULL value. I wouldn’t use a set as this implies to hashes and two lookups into a hash map: the first for the set key in the database and the second for the hash internal to the set for the element.
Use the existence of the key as either “need customs declaration” or “does not need a customs declaration” as Tomasz says.
Using simple keys allows you to use the SET command with NX/XX conditions, which may be handy in your logic:
Use
EXISTS
command instead ofGET
as it is slightly faster (no type checking, no value fetching).Another advantage of simple keys vs sets is to get the value of multiple keys at once using
MGET
:To be able to do complex queries, assuming these are rare and not optimized for performance, you can use
SSCAN
(if you go with sets) or KEYS (if you go with simple keys). However, if you go with simple keys you better use a dedicated database, seeSELECT
.To query for those with NL on the left side you would use: