skip to Main Content

I’m trying to filter through millions of documents. Each document is about 7kB in size.
For the moment, I’m filtering on a maximum of 8 (defined) fields, but more will be added later.

Filters can be anything, like :

  • "field": "value"
  • "field: {"$gte": value}
  • "field": {"$regex": "regexp"}
  • "field": {"$in": […]}
  • etc…

I want it to be quick. So I’ve created indexes.
But I don’t know if I need to create every possible index.
Example:
For fields a,b,c,d,e,f,g,h :

  • should I create a single compound index with all these fields ?
  • or create every combinaison of indexes :
    • a
    • a,b
    • […]
    • b,e,g,h
    • etc…

The problem with doing this is that you’re limited to 64 indexes.

But even with the right indexes, the search takes a long time sometimes.

I have also a wildcard index on an object. I tried to create a compound index with this wildcard index but it is never used. So the query is pretty slow when I’m searching on my normal filters + the filter that has the wildcard index.

Is there a solution for having a higher number of filterable fields without having index or performance problems?
Maybe create a smaller collection with all these filters ? Create a collection for every filter and request each collection and cross the results to find the corrects results ?
I don’t know..

Thanks a lot!

3

Answers


  1. Indexes are among the most important tools to speed up queries; however, indexes should be able to be kept in RAM and thus only the required indexes should be created. This also depends on the data model that you are using. Creating many indexes in lots of variations puts a lot of load on the RAM of the cluster, even if you manage to keep it under the maximum of 64 indexes.

    In order to reduce the amount of indexes needed, have a look at the attribute pattern. This pattern proposes to move attributes from dedicated fields to attribute arrays, e.g. (sample taken from the blog post)

    The following schema

    {
      // ...
      "releasedate_en": ISODate("2024-01-20T00:0000Z"),
      "releasedate_de": ISODate("2024-01-22T00:0000Z"),
      // ... 
     }
    

    is restructured like this:

    {
      // ...
      "releases": [
        { "location": "en", "date": ISODate("2024-01-20T00:0000Z") },
        { "location": "de", "date": ISODate("2024-01-22T00:0000Z") },
      ]
      // ... 
     }
    

    Instead of creating dedicated indexes for all the attributes, you can create a single index that covers the attribute name and value, e.g.

    { "releases.location": 1, "releases.date": 1}
    

    In the sample, the attributes have a very homogenous structure, but from a technical point of view you can also group attributes with different value types in the array.

    Using this pattern you can reduce the amount of indexes dramatically and you do not have to depict all the combinations of the fields.

    Login or Signup to reply.
  2. You don’t need indexes on low-cardinality field. Create the indexes only on selective fields. For example an index on field gender which contain only [male, female, others] would be useless in most cases.

    Note, normal indexes are not used in every case for $regex, see Index Use

    Login or Signup to reply.
  3. With you question:create every combinaison of indexes. The answer is no. The collection with many index make reduced write performance and longer index builds and insufficient RAM. In addition, It can cause query to astray by an index, that is not optimal because there are too many indexes.

    should I create a single compound index with all these fields ?. The answer also is no. See the post to create some index that cover some case query and sort that you need. If you have many index, You can use hint to figure out which option is the most suitable

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search