skip to Main Content

When an optional key in a collection is queried , performance varied with respect to the following two queries.

  1. Good response : When it queried with a specific value – a value other than null, it resulted good response time.

  2. Poor response : When it queried with the operator $exists, it resulted poor response time.

This difference in performance should be due to difference in the execution plans chose by query optimiser. However I am not able to understand the reasoning behind it.

Therefore I request if you can explain the reason for this difference in performance, it would help me to increase and confirm my understanding.

Collection Statistics

  • Number of documents : 100,000
  • Name of the optional key : field_A
  • Present status of field_A : It does not exists in any of the documents.
  • Index : { field_A : 1}

The two queries:

Query 1 – with Good response time

  • Query document : { field_A : }
  • Execution time : 0.5 seconds
  • Number of documents returned : 0

Query 2 – with poor response time

  • Query document : { field_A : { $exists : true } }
  • Execution time : 5 seconds
  • Number of documents returned : 0

2

Answers


  1. Most likely it uses the index.

    An index (more precisely a B*Tree index) hss a sorted list of all your values. So, in your case the index hold 100k times the value undefined or similar. You have to read all these values to determine whether the value is undefined or not.

    On the other scenario, when you look for some_value, then you know the result already after reading the first entry from the index, because it would appear before the first undefined value in case the value would exists.

    Don’t take above explanation too literally, it just explains the principle.

    Just a note, many database do not store any value in index if the value does not exist. In your case the index size would be 0 Bytes. In such database your index would be useless. However, MongoDB stores also undefined values in the index.

    Login or Signup to reply.
  2. The case is basically about “indexing an optional key”.

    An index on an optional key is complete:
    It is complete in the sense that the number of documents in the index and collection will be the same. It is also to note that for the documents wherein this key is either “not existing” or “null” will also be included in the index. The reason behind this kind of indexing is that by default indexes are non-unique or in other words the duplicated key values are also included in the index.

    Now on to this case:

    In this case the optional key is “field_A”. Let us take a sample collection as below. It has 4 documents and the index on the key “field_A” will also have 4 documents. The 3 documents be indexed under the same value – null and 1 document with a value 1. This behaviour has been explained under the section “Sparse index” in the manual:

    Sample collection :
    [
      { field_A: 1, field_B: ‘a’ },
      { field_B: ‘b’ },
      { field_B: ‘c’ },
      { field_B: ‘d’ } ]
    
    Sample index - field_A :
    [ { field_A: null } pointed to the document { field_B: ‘a’ }  ,
      { field_A: null } pointed to the document { field_B: ‘b’ }  ,
      { field_A: null } pointed to the document { field_B: ‘c’ }  ,
      { field_A: 1 } pointed to the document    { field_A: 1, field_B: ‘d’ } ]
    

    First query – { field_A: "some_value" }

    Now the query under this context, { field_A: "some_value" } will be supported by this index. In a sample case, while querying for a value 1, as the indexes are ordered, it will find the value 1 quickly and will return the document. Therefore this query should be seen performant.

    Second query – { field_A: { $exists: true } }

    This query will not use this index therefore it will perform a collection scan. Thus it will take time.
    The reason for not using the index has been explained in the manual under section “$exists” as below :

    “Queries that use { $exists: true } on fields that use a non-sparse index or that use { $exists: true } on fields that are not indexed examine all documents in a collection.”

    In this case the index will not be used since the index is a non-sparse index. This query calls for a sparse-index.

    Sparse index
    https://www.mongodb.com/docs/manual/core/index-sparse/#std-label-index-type-sparse:~:text=%E2%86%92%20MongoDB%20Manual-,Sparse%20Indexes,-Sparse%20indexes%20only

    $exists
    https://www.mongodb.com/docs/manual/reference/operator/query/exists/#:~:text=a%20FETCH.-,Queries%20that%20use,-%7B%20%24exists%3A%20true%20%7D

    Thanks
    WeDoTheBest4You

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search