I’ve been trying to extract one record each 1 hour from a collection with 1 record each 10 seconds on the datetime (Date). The document is like this:
datetime: 2021-11-26T15:08:20.000+00:00
owner_id: "INTERNAL"
motor_id: 24
The datetime document field is Date() type.
On this collection I’ve 1 record each 10 seconds on datetime field (several days), but I would like to extract just one record each 1 hour datetime interval. How could I archive this?
Thanks in advance for any tip, help or direction to the right documentation.
I was trying to follow this: https://mongoing.com/docs/reference/operator/aggregation/filter.html with no success. Not sure how to increment the hour.
EDIT: Some more information about the collection:
collection1 with 26.303 records like this:
{
"_id": {
"$oid": "644b34a3756b9c1dc169e20d"
},
"datetime": {
"$date": {
"$numberLong": "1637939300000"
}
},
"customer_id": "INTERNAL",
"owner_id": "INTERNAL",
"GMHP": -0.54,
"AMHP": 198.19,
"OMHP": 428.883544921875,
"AROP": 148.57066345214844,
"OROP": 102.5451889038086
}
Each document has a 10 seconds interval stored in datetime field. I’ve plans to change those documents to time-series in the future.
My goal is to select the documents on each hour, e.g:
1st document datetime: 2021-11-26T15:08:20.000+00:00
2nd document datetime: 2021-11-26T15:09:20.000+00:00
and so on…
I could safely ignore all fields between T15:08:30 to T15:09:10.
2
Answers
After some struggle I was trying to make the query sent by nimrod serok to work both on Mongo and DocumentDB. Then I realized how limited problematic is DocumentDB when using PyMongo or vanilla MongoDB driver.
The
$set
is not supported on DocumentDB, so I had to replace it by$addFields
. After that I realizeddateToParts
is also not supported because this function expects a string format and I had a Date() type.If by any chance anyone wants to achieve similar results on DocumentDB here is a workaround. Probably there are better solutions for that, however it worked for me with the same results as Mongo query.
(Im using PyMongo)
the
entry_date
determines the first datetime from the collection, it's optional. You can replace theminute
andsecond
variables to fit your range needs.I hope it can help the DocumentDB users as well.
One option is to simply use
$dateToParts
. If you have a document for each 10 seconds, the$match
step will return the first document for each hour:See how it works on the playground example