Azure Cosmos Indexing only "id"

jlyh
April 6, 2023
165 views
0 votes
2 Answers

How do I define my indexing policy if I only want to index (the default) "id" in Azure Cosmos Containers? If I do the below, would id be dropped as an index? or would it not index anything except for the default id? I tried looking through documentation but couldn’t find anything explicit about this.

indexing_policy {
    indexing_mode = "consistent"

    excluded_path {
      path = "/*"
    }
  }

Answers

- silent
- April 6, 2023 at 2:30 pm
- 0 votes
0
Yes, /id is always being indexed

If the indexing mode is set to consistent, the system properties id and _ts are automatically indexed.

Source

So yes, your indexing policy should give you what you are looking for.

Login or Signup to reply.

- MarkBrown
- April 6, 2023 at 4:33 pm
- 0 votes
0
Are you partitioning by /id as well and using your container as a key-value store?

If so then you can turn off indexing altogether because you will not need (or want) to use queries to do single document operations. Point operations using the id and partition key values passed into those operations do not require users to define an index at all.

Update:

Additional comments regarding your design using /id as your partition key but then doing range queries deviceId & a unix time stamp.

In IoT scenarios (which this sounds like) using /id as your partition key with a random GUID (which it also sounds like you are also doing) will not scale. Additionally, the fact that you are actually going to perform queries across your data requires you define an index. My answer ONLY applies if you NEVER run queries, only use point CRUD operations.

The challenge you may face is, as your container initially grows beyond 10K RU/s or 50GB of storage, your container will do a partition split and redistribute your data across both. As that continues and more physical partitions get added and your data distributed among them, your queries will get increasingly slower and more expensive. The design will fundamentally not scale, which is basically the opposite of why you would use this type of database (infinite scale with low latency)

A more typical design for an IoT solution would look like this.
1. Ingestion container which is partitioned by /deviceId. This container is where high volume ingestion occurs. Append-only with a TTL defined for the container to prevent any logical partition from growing over 20GB. Note: you have to keep indexing turned on to use TTL. So keep indexing on and use this indexing policy below.
2. Next would be a second or more containers, each of which are designed to serve specific queries (materialized view pattern). These would be partitioned and indexed by whatever best optimizes the queries in which they serve. I can’t say specifically what that partition key would be, but you should design it such that queries can be answered as an in-partition query, meaning your query should include an equality expression in the where clause, c.deviceId == @deviceIdValue. Depending on the query, indexes should be defined on one or more properties with optional composite indexes where you have range expressions and/or order by’s on other properties. (PS: you cannot do range filters across partition keys).
3. Last is to use Change Feed and host in an Azure Function or any compute to listen for inserts on the first container then write each item into one or more containers which are created to serve your queries. Given most of these solutions TTL data off to keep efficient in storage, usually users take the opportunity to write it off into blob storage for cold storage.
The caveat in all this is you need to measure to understand whether you need any secondary containers. If your workload is very small, then you don’t need to do any of this. But as it grows, these patterns are designed to help with scalability.

Sample Minimum Indexing Policy to use TTL

{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [ ],
"excludedPaths": [
{
"path": "/*"
}
] }
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.