Mongodb - Mongo - aggregate docs containing contiguous time ranges into minimum document number

SamNiconi
January 31, 2024
141 views
0 votes
2 Answers

I have millions of docs containing start and end times of something they’re measuring.

[
   {
    "start" : ISODate("2024-01-01T00:00:00"),
    "end" :   ISODate("2024-01-01T01:00:00")
   },
   {
    "start" : ISODate("2024-01-01T01:00:00"),
    "end" :   ISODate("2024-01-01T02:00:00")
   },
   {
    "start" : ISODate("2024-01-01T03:00:00"),
    "end" :   ISODate("2024-01-01T04:00:00")
   },
]

I would like a way of generating a result set containing only the non-contiguous blocks. For this example, that would look like:

[
   {
    "start" : ISODate("2024-01-01T00:00:00"),
    "end" :   ISODate("2024-01-01T02:00:00")
   },
   {
    "start" : ISODate("2024-01-01T03:00:00"),
    "end" :   ISODate("2024-01-01T04:00:00")
   },
]

I’m experienced with using aggregate and $group, but I can’t think of a way to acheive what I need.

Any suggestions would be much appreciated.

Answers

You can achieve the expected behaviour through 2 $setWindowFields. For the first $setWindowFields, you can find the $max end field for each document in the document range of ["unbounded", -1] that is sorted by {start: 1}. i.e. For a current document, you will search in the document range that has a start smaller than the current document and get the max end. We call this value prevMax.

Then, by comparing the prevMax with current document’s start. You can determine if it should be start of a group. We call this boolean isStart.

After that, by using another $setWindowFields, you can conditionally find the grouping of all documents through isStart and in the document range of [current, unbounded]

Finally, using the grouping field to find $max end inside the grouping.

db.collection.aggregate([
  {
    "$setWindowFields": {
      "sortBy": {
        "start": 1
      },
      "output": {
        "prevMax": {
          "$max": "$end",
          "window": {
            "documents": [
              "unbounded",
              -1
            ]
          }
        }
      }
    }
  },
  {
    $set: {
      isStart: {
        $lt: [
          "$prevMax",
          "$start"
        ]
      }
    }
  },
  {
    "$setWindowFields": {
      "sortBy": {
        "start": 1
      },
      "output": {
        "grouping": {
          "$max": {
            "$cond": {
              "if": {
                "$eq": [
                  true,
                  "$isStart"
                ]
              },
              "then": "$start",
              "else": -1
            }
          },
          "window": {
            "documents": [
              "unbounded",
              "current"
            ]
          }
        }
      }
    }
  },
  {
    "$group": {
      _id: "$grouping",
      end: {
        $max: "$end"
      }
    }
  },
  {
    "$project": {
      _id: 0,
      start: "$_id",
      end: 1
    }
  }
])

Mongo Playground

Following @ray, if we can assume there are no overlaps in the documents, there is an option to look only at the previous document and the next document for each document, not scanning all previous document for each document (twice). This can also help with matching, which can reduce the number of documents we need to $set:

db.collection.aggregate([
  {$setWindowFields: {
      sortBy: {start 1},
      output: {
        prevMax: {$push: "$end", window: {documents: [-1, 0]}},
        nextMin: {$push: "$start", window: {documents: [0, 1]}}
      }
  }},
  {$match: {$or: [
        {$expr: {$lt: [{$first: "$prevMax"}, "$start"]}},
        {$expr: {$gt: [{$last: "$nextMin"}, "$end"]}},
        {"prevMax.1": {$exists: false}},
        {"nextMin.1": {$exists: false}}
  ]}},
  {$set: {
      isStart: {$or: [
          {$lt: [{$first: "$prevMax"}, "$start"]},
          {$eq: [{$size: "$prevMax"}, 1]}
      ]}
  }},
  {$setWindowFields: {
      sortBy: {start: 1},
      output: {grouping: {
          $max: {$cond: [{$eq: [true, "$isStart"]}, "$start", -1}},
          window: {documents: [-1, 0]}
      }}
  }},
  {$group: {_id: "$grouping", end: {$max: "$end"}}},
  {$project: {_id: 0, start: "$_id", end: 1}}
])

See how it works on the playground example

Please signup or login to give your own answer.

Click here to cancel reply.

Mongodb – Mongo – aggregate docs containing contiguous time ranges into minimum document number

Answers