As per Azure cosmos DB Documentation below URL, each partition key creates logical partition.
https://learn.microsoft.com/en-us/azure/cosmos-db/partitioning-overview
Let say I have below data
{
"firstname": "Phil",
"LastName": "Dixon",
"age": 28,
"org": "Fin",
"Level": 3,
"region": "India",
"id": "123",
"which-city": "Bangalore",
},
{
"userID": 1,
"Name": "Bob",
"Hobbies": "Dancing",
"Region": "USA"
},
{
"userID": 2,
"Name": "Anna",
"Hobbies": "Dancing",
"Region": "USA"
},
{
"userID": 3,
"Name": "Phil",
"Hobbies": "Dancing",
"Region": "USA"
},
{
"userID": 4,
"Name": "Jog",
"Hobbies": "Dancing",
"Region": "India"
},
{
"userID": 5,
"Name": "Maxi",
"Hobbies": "Playing",
"Region": "India"
},
{
"userID": 6,
"Name": "Capi",
"Hobbies": "Playing",
"Region": "Japan"
},
If I choose, userID as partition key, for each item it creates separate logical partition, Does it slows my performance?
As per document I understand region might be right partition key for my use case. But I would like to understand, what will happen if I choose userid as partition key and region as partition key in terms performance.
More information:
During userID is partition key, I make queries against userID property
During region is partition key, I make queries against region property
API: SQL
2
Answers
The more logical partitions you have the better it will be especially if you are filtering your queries against your partition key.
An increase in logical partitions does not affect performance negatively. But you will only see significant gains if it grows really big. Also having a lot of logical partitions does not translate to having an equal number of physical partitions.
NO, having more logical partitions does NOT (usually) slow your performance.
To be more precise, it definitely does NOT slow your performance as long as your query is filtering by partition key. Generally you should have the the smallest possible partition while avoiding having too many cross-partition queries. It really depends on which queries you intend to execute and how often. Write them out and see which ones would be cross-partition.
Cross-partition queries MAY become slower and costlier but it only starts to notably matter if your data set grows to many physical partitions (= 10++ GB). Do note that cross-partitions queries are generally OK, as long as they have a good selective predicate which is indexed.
Also consider write scenarios. Write scenarios would benefit from writes being spread over many partitions at any given time. It only matters when you do heavy bursts over many physical partitions, though. Writing a 100 docs a time is a no biggie.
Also consider max partition size. A good partition should have a logical limit of growth somewhere. Imagine your worst partition and plan the heaviest imaginable growth over reasonably long time period.
Also consider other documents you’d like to store in the same container (in the future). For example, user preferences, companies, subregions, whatnot. They would have their own required queries and partition key choice matters for them as well.
It depends on your queries, growth patterns, and many other aspects, but …
Region
seem to be risky:region
andRegion
in your samples, that is not the same thingUserId
, on the other hand is:Summary:
Plan your "document types", the most commonly executed queries, estimate the data distribution over partition and indexes, and consider expected growth patterns.
From afar, it seems better to partition by UserId and just index on region. If in doubt, generate some data and measure the RU of different queries.