skip to Main Content

Currently by default when an expensive query is performed we apply a limit (as well as a time limit) to prevent a query from taking too long. We have a few simple indexes and are not ready to add any new indexes including compound indexes by date as there are many date fields and combining them with every possible search field would increase the number of indexes needed exponentially.

That said, what we observe is that when a executing a query such the below, the results arrive by earliest insertion date:

db.getCollection("PeopleCollection").find(
    {
        "relative.resourceId" : JUUID("6DE84AC1-236F-429A-853F-9BD155C6FFEB")
    }
)
.limit(10);

To get the latest set of 10 records rather than the earliest, we could add a sort such as .sort({"createdDateTime": -1.0}) but this causes the query to run much longer since the entire set of matched records has to be retrieved and sorted in memory. What would be ideal is if we could set it up so that the default retrieval is from the latest rather than earliest. My understanding is that since MongoDB’s use of the _id index by default when no filtering is required, it does some similar retrieval method when filters are added to the query, hence the retrieval by default from earliest to latest. Is there not some way to tell MongoDB to instead retrieve those from latest to earliest or does that require re-arranging the documents on disk via a clustered index?

2

Answers


  1. Add the index for mongo collection like this:

    db.collection.createIndex(
      {
          "createdDateTime": -1 // descending index, this is faster in your case
      },
      {
          unique: false, // two createdDateTime can be same
          sparse: true, // when the document has no createdDateTime, no error
          expireAfterSeconds: 2592000 // reduce the index size, after 30 day (60*60*24*30) the index does not contains the document anymore
      }
    )
    
    Login or Signup to reply.
  2. TL;DR: create a compound index that provides the sort. In your case:

    db.getCollection({"PeopleCollection").createIndex(
      { "relative.resourceId": 1, "createdDateTime":1 }
    );
    

    This index will prevent the in-memory sort that you mention and should perform much better.


    Clarifications

    You’ve made a few other claims that are worth addressing as well.

    My understanding is that since MongoDB’s use of the _id index by default when no filtering is required, it does some similar retrieval method when filters are added to the query, hence the retrieval by default from earliest to latest.

    This is both incorrect and unrelated to your situation anyway.

    It is incorrect in the sense that when no filter is added to the query then the index on _id will NOT be used. Instead, since you are requesting all of the data, the database will just read all of the documents in the collection directly.

    It is unrelated to your situation since you are providing a filter in your query.

    There is also an additional nuance that _id values do not have to correspond with the ordering of inserts (though the automatic values typically do) and that the database is free to return documents in any order when no sort is requested.

    Is there not some way to tell MongoDB to instead retrieve those from latest to earliest

    It is a little unclear what you mean here. The way you do this is by adding the .sort(...) to the query, which is what tells the database in what order to return the results. This can be the same way in which it retrieves them, but doesn’t have to be. Did you mean something else?

    or does that require re-arranging the documents on disk via a clustered index?

    This, too, is a bit unclear. Using the index at the top of this answer will allow the database to avoid the in memory sort by retrieving the values in order. That seems to be what you want here.


    Views

    hence the retrieval by default from earliest to latest

    Going back to this once more, if you wanted you could create a view on this collection that had the specified sort order requested. This would change the "default" result order for queries against the view.

    In your case the view would be defined along the lines of:

    db.createView(
      "PeopleView",
      "PeopleCollection",
      [ {$sort: { createdDateTime: -1 } } ]
    )
    

    Then you would change the collection/view that the query was targeting to be:

    db.getCollection("PeopleView").find(
        {
            "relative.resourceId" : JUUID("6DE84AC1-236F-429A-853F-9BD155C6FFEB")
        }
    )
    .limit(10);
    

    And the results would both come back ordered and, assuming the index mentioned earlier was created, would be very efficient.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search