skip to Main Content

I have around 10 million + increasing users with Email and Phone numbers. Both pointing to an User ID. I created 2 Hashes. One for Email and other for Phone numbers like

//A single user with Both Email and Phone number pointing to same User ID
$redis->hSet('email-users', '[email protected]', 1);
$redis->hSet('phone-users', '+192938384849', 1);

Now as there are around millions of users, the Hash is growing to overloaded and I also want to search through these Hashes. Like I want to get the User ID from an Email from email-users hash.

As I found that Hashes should be maintained with ZipList at Redis — best way to store a large map (dictionary) and divided into smaller buckets of a fixed size say max 10000 keys in a single Hash.

So, if I divide my 10 Million Users into buckets of 10000 keys there would be around 1000 Hashes for Emails and 1000 for Phone numbers.

My Questions is, Should I divide my users into these 1000 buckets? and if yes then how can I search through these 1000 buckets? Or is there a better alternative?

P.S. I am using PHP and getting all 1000 Hashes and loop through them can be quite resource intensive and I am afraid that using a wrong approach would also kill the actual performance of Redis Power.

Just for a side note, I think that we can create some algorithm like libketama for consistent hashing to place keys in random servers.

Also if it is hard to work on alphabats, we can convert each email to numbers first like a=1, b=2, c=3 … z=26 with 0 (Zero) appended for making it unique and +s for @ and . characters. For Example

[email protected]  ->  10203040+901301090+3015013

So, now we have numbers which make it easier to apply any calculations.

2

Answers


  1. what you may do is distribution of letters and numbers according to first or first couple of letters/digits.

    you may create your hashes like this; email first letter, phone number first or first two digits

    • email-users-a
    • email-users-b
    • email-users-c
    • phone-users-10
    • phone-users-11

    while you do hset/hget, you arrange this on code level.

    Edit:

    Let’s say we will use first two digits for phone numbers and first two letters for email;

    then we will have keys like following;

    • email-users-aa
    • email-users-ab
    • phone-users-11
    • phone-users-12

    When we have an email like [email protected] then we will go to er email hash group which is email-users-er and execute hget email-users-er [email protected].

    When we have phone number like 123456789 then we will go to 12 phone hash group which is phone-users-12 and execute hget phone-users-12 123456789.

    Login or Signup to reply.
  2. My Questions is, Should I divide my users into these 1000 buckets? and
    if yes then how can I search through these 1000 buckets? Or is there a
    better alternative?

    Yes. The approach could work in the following way.

    For this example, let’s treat both the phone numbers and email Ids as strings.

    Let’s say you have the following buckets(Redis Hash):

    For Email Ids: email_0001, email_0002, ..., email_1000
    For Phone Numbers: phone_0001, phone_0002, ..., phone_1000
    
    1. Given an email Id, determine the bucket (max being 1000) by hashing the email Id. You can use consistent hashing for this purpose. Now add the key and value to the appropriate ‘bucket’.

      $ HSET "email_0032" "[email protected]" "UID_987"
      
    2. Repeat step 1 for phone numbers. This prevents you from the need of bookeeping which key goes into which bucket. Given the same key, the hash will always give the same value thereby returning the same bucket number.

      $ HSET "phone_0091" "+192938384849" "UID_987"
      
    3. To retrieve a value, first find the bucket by hashing the email/phone and then looking up the value in appropriate bucket.

      $ HGET "phone_0091" "+192938384849"
        UID_987
      
    import java.nio.charset.Charset;
    import com.google.common.hash.HashFunction;
    import com.google.common.hash.Hashing;
    
    public class Sample {
    
        private static final int BUCKET_SIZE = 1000;
        private static final HashFunction hashFunction = Hashing.murmur3_128();
        private static final Charset UTF8 = Charset.forName("UTF-8");
    
        private Sample() {
        }
    
        public static int pickBucket(String key, int buckets) {
            int bucket = com.google.common.hash.Hashing.consistentHash(hashFunction.hashString(key, UTF8).asLong(), buckets);
            return bucket;
        }
    
        private static void getFromRedisHash(String key) {
    
            int bucket = pickBucket(key, BUCKET_SIZE);
            // Get From Redis based on the bucket number
        }
    
        public static void main(String[] args) {
    
            System.out.println(pickBucket("[email protected]", BUCKET_SIZE));
            System.out.println(pickBucket("+192938384849", BUCKET_SIZE));
        }
    }
    

    The above example is in Java, I’m assuming PHP would have similar libraries for hashing.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search