I have millions of docs containing start and end times of something they’re measuring.
[
{
"start" : ISODate("2024-01-01T00:00:00"),
"end" : ISODate("2024-01-01T01:00:00")
},
{
"start" : ISODate("2024-01-01T01:00:00"),
"end" : ISODate("2024-01-01T02:00:00")
},
{
"start" : ISODate("2024-01-01T03:00:00"),
"end" : ISODate("2024-01-01T04:00:00")
},
]
I would like a way of generating a result set containing only the non-contiguous blocks. For this example, that would look like:
[
{
"start" : ISODate("2024-01-01T00:00:00"),
"end" : ISODate("2024-01-01T02:00:00")
},
{
"start" : ISODate("2024-01-01T03:00:00"),
"end" : ISODate("2024-01-01T04:00:00")
},
]
I’m experienced with using aggregate and $group, but I can’t think of a way to acheive what I need.
Any suggestions would be much appreciated.
2
Answers
You can achieve the expected behaviour through 2
$setWindowFields
. For the first$setWindowFields
, you can find the$max
end
field for each document in the document range of["unbounded", -1]
that is sorted by{start: 1}
. i.e. For a current document, you will search in the document range that has astart
smaller than the current document and get the maxend
. We call this valueprevMax
.Then, by comparing the
prevMax
with current document’sstart
. You can determine if it should be start of a group. We call this booleanisStart
.After that, by using another
$setWindowFields
, you can conditionally find thegrouping
of all documents throughisStart
and in the document range of[current, unbounded]
Finally, using the
grouping
field to find$max
end
inside the grouping.Mongo Playground
Following @ray, if we can assume there are no overlaps in the documents, there is an option to look only at the previous document and the next document for each document, not scanning all previous document for each document (twice). This can also help with matching, which can reduce the number of documents we need to
$set
:See how it works on the playground example