Currently by default when an expensive query is performed we apply a limit (as well as a time limit) to prevent a query from taking too long. We have a few simple indexes and are not ready to add any new indexes including compound indexes by date as there are many date fields and combining them with every possible search field would increase the number of indexes needed exponentially.
That said, what we observe is that when a executing a query such the below, the results arrive by earliest insertion date:
db.getCollection("PeopleCollection").find(
{
"relative.resourceId" : JUUID("6DE84AC1-236F-429A-853F-9BD155C6FFEB")
}
)
.limit(10);
To get the latest set of 10 records rather than the earliest, we could add a sort such as .sort({"createdDateTime": -1.0})
but this causes the query to run much longer since the entire set of matched records has to be retrieved and sorted in memory. What would be ideal is if we could set it up so that the default retrieval is from the latest rather than earliest. My understanding is that since MongoDB’s use of the _id index by default when no filtering is required, it does some similar retrieval method when filters are added to the query, hence the retrieval by default from earliest to latest. Is there not some way to tell MongoDB to instead retrieve those from latest to earliest or does that require re-arranging the documents on disk via a clustered index?
2
Answers
Add the index for mongo collection like this:
TL;DR: create a compound index that provides the sort. In your case:
This index will prevent the in-memory sort that you mention and should perform much better.
Clarifications
You’ve made a few other claims that are worth addressing as well.
This is both incorrect and unrelated to your situation anyway.
It is incorrect in the sense that when no filter is added to the query then the index on
_id
will NOT be used. Instead, since you are requesting all of the data, the database will just read all of the documents in the collection directly.It is unrelated to your situation since you are providing a filter in your query.
There is also an additional nuance that
_id
values do not have to correspond with the ordering of inserts (though the automatic values typically do) and that the database is free to return documents in any order when no sort is requested.It is a little unclear what you mean here. The way you do this is by adding the
.sort(...)
to the query, which is what tells the database in what order to return the results. This can be the same way in which it retrieves them, but doesn’t have to be. Did you mean something else?This, too, is a bit unclear. Using the index at the top of this answer will allow the database to avoid the in memory sort by retrieving the values in order. That seems to be what you want here.
Views
Going back to this once more, if you wanted you could create a view on this collection that had the specified sort order requested. This would change the "default" result order for queries against the view.
In your case the view would be defined along the lines of:
Then you would change the collection/view that the query was targeting to be:
And the results would both come back ordered and, assuming the index mentioned earlier was created, would be very efficient.