skip to Main Content

I am trying to find the max of a value in a range of dates. The aggregate query I use has a match on indexed column _id. But the query takes too long and the explain plan tells me its going of a COLLSCAN and not an index scan. Can you please suggest why it wont make use of the index on _id?

Would it help if I created another index on colId?

    {$match:{_id:{ $regex: 'regex'}}},
    {$match:{$and:[{"colId":'DATA'}]}},
    {$unwind:"$data"},
    {$match:{$and:[{"data.time":{$gte:ISODate("xyz"),$lte:ISODate("zyx")}}]}},
    {$match:{$and: [{ "data.col": { $exists: true}}] }},
    {$group:{_id:"$data.time",maxCol:{$max:"$data.col"}}} ,
    {$sort:{"maxCol":-1,_id:-1}},
    {$limit:1}
    ])

Explain plan snippet:

                "winningPlan" : {
                        "stage" : "LIMIT_SKIP",
                        "inputStage" : {
                                "stage" : "SORT",
                                "sortPattern" : {
                                        "_id" : -1,
                                        "maxCol" : -1
                                },
                                "inputStage" : {
                                        "stage" : "SUBSCAN",
                                        "inputStage" : {
                                                "stage" : "HASH_AGGREGATE",
                                                "inputStage" : {
                                                        "stage" : "SUBSCAN",
                                                        "inputStage" : {
                                                                "stage" : "PROJECTION",
                                                                "inputStage" : {
                                                                        "stage" : "COLLSCAN"
                                                                }
                                                        }
                                                }
                                        }
                                }
                        }

This is on DocumentDB (mongo4)

2

Answers


  1. Chosen as BEST ANSWER

    DocumentDB does not seem to support index scan automatically when there is a regular expression match involved in the indexed column. Document suggests that we use hints to nudge AWS to use those indexes (wonder why the spoon-feeding).

    db.collection.aggregate([all_your_filters], {hint: {_id: 1}});

    But in my case adding the hint threw errors because it did not like me using non-capturing groups (?:x) in my regex. So I had to remove groups from my regex for the hints to work.


  2. I think regular expression cannot use indexes. $match works also on array, try this one:

    db.collection.aggregate([
       {
          $match: {
             "colId": 'DATA',
             "data.time": { $gte: ISODate("xyz"), $lte: ISODate("zyx") },
             "data.col": { $exists: true }
          }
       },
       { $match: { _id: { $regex: 'regex' } } },
       { $unwind: "$data" },
       { $group: { _id: "$data.time", maxCol: { $max: "$data.col" } } },
       { $sort: { "maxCol": -1, _id: -1 } },
       { $limit: 1 }
    ])
    

    As consequence put an index on {colId: 1, "data.time": 1} or {colId: 1, "data.time": 1, "data.col": 1}

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search