skip to Main Content

I have a collection that has a createdAt field in it. When I get multiple documents from this collection, I use this field to sort them from the newest to the oldest. The thing is, that multiple documents can somehow be created at the same millisecond and therefore have the exact same date. Because of that, I need my sorting to have some tie-breaker for those situations (I don’t care what order they will be, but it has to be consistent). I tried using the _id for that, and sorting using {createdAt: -1, _id: -1}, but it creates problems with the use of indexes with some queries, making some very common queries slower and inefficient.

Then, I thought that instead of saving createdAt as a Date object, I can just save it as a string (ISO 8601) and then add a random string at the end. That way even if multiple documents have the same Date, thanks to the random string, they will still have a consistent order. It’s like a built-in tie-breaker that makes it very simple to work with sorting or getting all of the documents that were created before some other document (something that I use a lot for pagination). The only thing that it adds to my code is that when I want to get the actual date, I need to remove that random string and convert it to Date, but this is very easily done with new Date(createdAt.split('Z')[0] + 'Z').

Does this approach has any downsides? Something that I need to consider?

2

Answers


  1. Sounds like you might want to consider using Universally Unique Lexicographically Sortable Identifier (ULID).

    It will create a random ID, based on the timestamp. It encodes the timestamp and appends a random part to avoid conflicts for the exact same date.

    const {ulid} = require('ulid');
    
    ulid(new Date('2022-01-01').valueOf()); // 01FR9EZ700AT3P6EA9PW5K9NQ8
    ulid(new Date('2022-01-01').valueOf()); // 01FR9EZ700NQ1RP84E34S1W6X1
    

    Note, if I generate a ulid for the same data again, the first 10 bytes of the string will remain the same, the rest will be changed only. This is because of the ULID format:

     01AN4Z07BY      79KA1307SR9X4MV3
    
    |----------|    |----------------|
     Timestamp          Randomness
       48bits             80bits
    

    So in your case, instead of having a compound index for {createdAt: -1, _id:1}, you’ll have a simple index based on 1 field that stores a ULID.

    But wait, how do you search based on some date then? How do you decode the original date back?

    Searching documents larger than a date:

    const {encodeTime} = require('ulid');
    
    const datePrefix = encodeTime(new Date('2022-01-01').valueOf(), 10);
    
    db.collection.find({ulidField: {$gte: datePrefix}})
    

    Returning the original date back:

    const {decodeTime} = require('ulid');
    
    new Date(decodeTime('01FR9EZ700AT3P6EA9PW5K9NQ8')); // Sat Jan 01 2022 01:00:00 GMT+0100 (Central European Standard Time)
    
    Login or Signup to reply.
  2. The index bounds are not optimal, as you see in the explain plan:

    indexBounds: {
        tags: [ '["some_tag", "some_tag"]' ],
        createdAt: [ '[MaxKey, MinKey]' ],
        _id: [ '[MaxKey, MinKey]' ]
    }
    

    Rewrite the query and try this one:

    db.questions.find(
       {
          $or: [
             { tags: "some_tag", createdAt: { $lt: ISODate('2022-10-12T17:39:49.774Z') } },
             { tags: "some_tag", createdAt: ISODate('2022-10-12T17:39:49.774Z'), _id: 'F3zcf4Ll0R' }
          ]
       }
    ).limit(20).sort({ createdAt: -1, _id: -1 }).explain("allPlansExecution")
    

    When you create index as db.questions.createIndex({ tags: 1, createdAt: -1 })
    then you get this plan:

    {
      stage: 'OR',
      inputStages: [
        {
          stage: 'FETCH',
          filter: {
            '$and': [
              { createdAt: { '$eq': ISODate("2022-10-12T17:39:49.774Z") } },
              { tags: { '$eq': 'some_tag' } }
            ]
          },
          inputStage: {
            stage: 'IXSCAN',
            keyPattern: { _id: 1 },
            indexName: '_id_',
            isMultiKey: false,
            multiKeyPaths: { _id: [] },
            isUnique: true,
            isSparse: false,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: { _id: [ '["F3zcf4Ll0R", "F3zcf4Ll0R"]' ] }
          }
        },
        {
          stage: 'IXSCAN',
          keyPattern: { tags: 1, createdAt: -1 },
          indexName: 'tags_1_createdAt_-1',
          isMultiKey: false,
          multiKeyPaths: { tags: [], createdAt: [] },
          isUnique: false,
          isSparse: false,
          isPartial: false,
          indexVersion: 2,
          direction: 'forward',
          indexBounds: {
            tags: [ '["some_tag", "some_tag"]' ],
            createdAt: [ '(new Date(1665596389774), new Date(-9223372036854775808)]' ]
          }
        }
      ]
    }
    

    Field _id is always unique (and always has an index), so it makes sense for MongoDB to query on this field and run OR with the second condition.

    Thus, you could also write:

    db.questions.find(
       {
          $or: [
             { tags: "some_tag", createdAt: { $lt: ISODate('2022-10-12T17:39:49.774Z') } },
             { _id: 'F3zcf4Ll0R' }
          ]
       }
    ).limit(20).sort({ createdAt: -1, _id: -1 }).explain("allPlansExecution")
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search