skip to Main Content

As per Azure cosmos DB Documentation below URL, each partition key creates logical partition.

https://learn.microsoft.com/en-us/azure/cosmos-db/partitioning-overview

Let say I have below data

{
    
    "firstname": "Phil",
    "LastName": "Dixon",
    "age": 28,
    "org": "Fin",
    "Level": 3,
    "region": "India",
    "id": "123",
    "which-city": "Bangalore",

},

{
 "userID": 1,
 "Name": "Bob",
 "Hobbies": "Dancing",
 "Region": "USA"
},
{
    "userID": 2,
    "Name": "Anna",
    "Hobbies": "Dancing",
    "Region": "USA"
   },
{
    "userID": 3,
    "Name": "Phil",
    "Hobbies": "Dancing",
    "Region": "USA"
   },
   {
    "userID": 4,
    "Name": "Jog",
    "Hobbies": "Dancing",
    "Region": "India"
   },
   {
    "userID": 5,
    "Name": "Maxi",
    "Hobbies": "Playing",
    "Region": "India"
   },
   {
    "userID": 6,
    "Name": "Capi",
    "Hobbies": "Playing",
    "Region": "Japan"
   },

If I choose, userID as partition key, for each item it creates separate logical partition, Does it slows my performance?

As per document I understand region might be right partition key for my use case. But I would like to understand, what will happen if I choose userid as partition key and region as partition key in terms performance.

More information:
During userID is partition key, I make queries against userID property
During region is partition key, I make queries against region property

API: SQL

2

Answers


  1. The more logical partitions you have the better it will be especially if you are filtering your queries against your partition key.

    An increase in logical partitions does not affect performance negatively. But you will only see significant gains if it grows really big. Also having a lot of logical partitions does not translate to having an equal number of physical partitions.

    Login or Signup to reply.
  2. If I choose, userID as partition key, for each item it creates separate logical partition, Does it slows my performance?

    NO, having more logical partitions does NOT (usually) slow your performance.

    To be more precise, it definitely does NOT slow your performance as long as your query is filtering by partition key. Generally you should have the the smallest possible partition while avoiding having too many cross-partition queries. It really depends on which queries you intend to execute and how often. Write them out and see which ones would be cross-partition.

    Cross-partition queries MAY become slower and costlier but it only starts to notably matter if your data set grows to many physical partitions (= 10++ GB). Do note that cross-partitions queries are generally OK, as long as they have a good selective predicate which is indexed.

    Also consider write scenarios. Write scenarios would benefit from writes being spread over many partitions at any given time. It only matters when you do heavy bursts over many physical partitions, though. Writing a 100 docs a time is a no biggie.

    Also consider max partition size. A good partition should have a logical limit of growth somewhere. Imagine your worst partition and plan the heaviest imaginable growth over reasonably long time period.

    Also consider other documents you’d like to store in the same container (in the future). For example, user preferences, companies, subregions, whatnot. They would have their own required queries and partition key choice matters for them as well.

    what will happen if I choose userid as partition key and region as partition key in terms performance.

    It depends on your queries, growth patterns, and many other aspects, but …

    Region seem to be risky:

    • seems to have potential to grow to be huge.
    • have the potential to have hot partitions and being throttled when throughput is shared between >>physical<< partitions. The same region most likely produces/consumes data at the same times.
    • partitions will be very uneven as regions have very different size and usage. They will grow to be even more uneven.
    • you have region and Region in your samples, that is not the same thing

    UserId, on the other hand is:

    • small (= less documents to scan in in-partition queries)
    • will scale well, presumably any big growth will be caused more by new users, rather than same users ramping up their partitions size endlessly.
    • presumably there will not be hot partitions, as active users at any given time will be shared roughly evenly across physical partitions.
    • it’s more unlikely a single user will ever hit partition size cap (compared to regions)

    Summary:

    Plan your "document types", the most commonly executed queries, estimate the data distribution over partition and indexes, and consider expected growth patterns.

    From afar, it seems better to partition by UserId and just index on region. If in doubt, generate some data and measure the RU of different queries.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search