I am using mongo db to store notifications, a cron
job runs every midnight to pick old notification and delete them.
The deletion policy has a logic to retain the oldest notification of each combination of "sourceObjectIdentifier"
and "notificationType"
. The aggregated output of query 1 is given to query 2 to form the deletion query.
query 1 –
db.persistentEvent.aggregate([
{
"$match": {
"notificationClass": {"$eq": "category"}
}
},
{
"$group": {
"_id": [
{"sourceObjectIdentifier": "$sourceObjectIdentifier"},
{"notificationType": "$notificationType"}
],
"notificationId": {"$last": "$notificationId"},
"notificationClass": {"$last": "$notificationClass"}
}
},
{
"$project": {
"_id": 0,
"notificationId": 1,
"notificationClass": 1
}
},
{
"$sort": {
"sourceObjectIdentifier": 1,
"notificationType": 1,
"creationTime": -1
}
}
])
query 2 –
{
"deliveryTime": {
"$gt": "$datefrom"
},
"creationTime": {
"$lt": "dateto"
},
"notificationId": {
"$nin": [query_1_output]
}
}
This deletion logic works fine if the number of notifications in the collection is less but it starts to take a long time when the number of notifications are in range 200k or above.
Can someone please suggest how can I improve the query and can the query 1 and 2 be executed together?
2
Answers
As written, query 1 will scan far more documents than is really necessary.
Also, by the time the sort stage is encountered, the group stage and project stage will have removed all fields except
notificationId
andnotificationClass
. This means none of the fields in the sort object will exist, so that sort will accomplish nothing.To improve this query:
notificationId
to the index.If the query executor is able to select and sort the documents based on the index, that will eliminate the need to load non-matching documents from storage, and will avoid an in-memory sort. Both of these will help performance.
If the documents are significantly larger than the combination of
notificationClass
,notificationId
,sourceObjectIdentifier
,notificationType
, andcreationTime
, creating an index containing those fields, following the ESR rule will permit query 1 to be satisfied using only the information included in the index, making it a covered query:With the above index this pipeline should perform significantly better than the existing query as the collection grows:
It is not clear how you define "newest notification".
notificationId
?creationTime
?First ensure that you have an index on the
notificationClass
. A composed index on{sourceObjectIdentifier: 1, notificationType: 1}
may also help.Use
$setWindowFields
to find the newest notifications (rank = 1), and delete all others. Below command considers "newest notification" as notification with the newestcreationTime
.Mongo Playground