Let’s suppose a collection has N documents and each document has an array field with M subdocuments.
For a query that uses a $elemMatch
on the field with the M subdocuments to filter and findMany documents of this collection, would the time complexity be equivalent to search a document of N X M documents?
2
Answers
Yes, the time complexity of a query that uses $elemMatch on an array field with M subdocuments in each of N documents is equivalent to searching through 𝑁×𝑀 documents. This is because for each of the N documents, the query needs to evaluate the condition on each of the M subdocuments within the array, resulting in a total of 𝑁×𝑀 evaluations.
Please try to compare the two data models under the context.
aggregate data: Single collection of top level documents with an array key of sub documents
Normalised data: Two separate collections – one to host the top level documents and another to host the sub documents.
The most obvious difference is in accessing the basic data:
While the aggregate model would require a single fetch only, the normalised model would require two separate fetching and a merge. The cost of this join operation in the normalised model is the reason for opting for the aggregate model. Please think about a plain retrieval of the two models. It is quite clear that a normalised model is costly.
Now including a predicate into this scenario does not change the fundamental difference. Including predicates helps to narrow the search, this is the first part of the query to be executed. After this part, in order to get the whole data, the join is still to be performed in a normalised data. But this is not needed in aggregate data.
Edit:1
Now instead of storing data into two separate collections, if the same is stored in the same single collection, this is also very same as the normalised model. Though we call a self join in this case, the cost of join is still the same. The basic data still remains as two separate documents not as a single document.
Join is always the cost to bear if the data is not aggregated.