skip to Main Content

I am designing a sharded database. Many times we use two columns, first for logical shard and second for uniquely identifying a row in the shard. Instead of that I am planning to have just a 1 column with long datatype for primary key. To have unique key throughout the servers I am planning to have bigserial that will generate non overlapping range.

server PK starts from PK ends at
1 1 9,999,999,999
2 10,000,000,000 19,999,999,999
3 20,000,000,000 29,999,999,999
4 30,000,000,000 39,999,999,999
5 40,000,000,000 49,999,999,999

and so on.

In future I should be able to

  1. Split large server to two or more small servers
  2. Join two or more small servers to 1 big server
  3. Move some rows from Server A to Server B for better utilization of resources.

I will also have a lookup table to which will contain information on range and target server.

I would like to learn about drawbacks of this approach.

2

Answers


  1. I recommend that you create the primary key column on server 1 like this:

    CREATE TABLE ... (
       id bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1000),
       ...
    );
    

    On the second server you use START WITH 2, and so on.

    Adding a new shard is easy, just use a new start value. Splitting or joining shards is trivial (as far as the primary keys are concerned…), because new value can never conflict with old values, and each shard generates different values.

    Login or Signup to reply.
  2. The two most common types of sharding keys are basically:

    • Based on a deterministic expression, like the modulus-based method suggested in @LaurenzAlbe’s answer.

    • Based on a lookup table, like the method you describe in your post.

    A drawback of the latter type is that your app has to check the lookup table frequently (perhaps even on every query), because the ranges could change. The ranges stored in the lookup table might be a good thing to put in cache, to avoid frequent SQL queries. Then replace the cached ranges when you change them. I assume this will be infrequent. With a cache, it’s a pretty modest drawback.

    I worked on a system like this, where we had a schema for each of our customers, distributed over 8 shards. At the start of each session, the app queried the lookup table to find out which shard the respective customer’s data was stored on. Once a year or so, we would move some customer schemas around to new shards, because naturally they tend to grow. This included updating the lookup table. It wasn’t a bad solution.

    I suggest that you will eventually have multiple non-contiguous ranges per server, because there are hotspots of data or traffic, and it makes sense to split the ranges if you want to move the least amount of data.

    server PK starts from PK ends at
    1 1 9,999,999,999
    2 10,000,000,000 19,999,999,999
    3 20,000,000,000 29,999,999,999
    4 30,000,000,000 32,999,999,999
    3 33,000,000,000 29,999,999,999
    5 40,000,000,000 49,999,999,999

    If you anticipate moving subsets of data from time to time, this can be a better design than the expression-based type of sharding. If you use an expression, and need to move data between shards, you must either make a more complex expression, or else rebalance a lot of data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search