skip to Main Content

I have a collection of documents that each contain arrays. the output i want is the unwinded array elements filtered down by a certain criteria.

i can do this by doing:

db.collection.aggregate([
 {
   $unwind: "$myArrayField"
 },
 {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 }
])

but it seems wasteful to unwind everything, so instead i do

db.collection.aggregate([
 {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 },
 {
   $unwind: "$myArrayField"
 },
 {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 }
])

to filter down to viable document candidates, unwind those, and then filter down the unwound elements again since there will be elements that don’t match.

alternatively, i can do

db.collection.aggregate([
 {
  $project: {
    myArrayField: {
      $filter: {
        input: "$myArrayField",
        as: "element",
        cond: { $eq: ["$$element.myCriteriaField", "myValue"] }
      }
    }
  }
 },
 {
  $unwind: "$myArrayField"
 }
])

or even

db.collection.aggregate([
  {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 },
 {
  $project: {
    myArrayField: {
      $filter: {
        input: "$myArrayField",
        as: "element",
        cond: { $eq: ["$$element.myCriteriaField", "myValue"] }
      }
    }
  }
 },
 {
  $unwind: "$myArrayField"
 }
])

which one is the most efficient? i don’t understand enough about how the aggregations are run to know what variables it could depend on (collection size, document size, etc).

lastly, the match unwind match seems pretty straightforward but it also just feels wrong to have to do it like that so i’m wondering if i’m missing something.

2

Answers


  1. First of all you should have an index on the field.

    Then you should limit the result as early as possible meaning the $match stage should come first.

    The rest is more a matter of taste.

    Login or Signup to reply.
  2. First, You must sure have an index on the field. Using Explain to check index hit with $match first stage.

    Now, we explain your 2 suggestion: $project $filter $unwind vs $match $project $filter $unwind:

    • $project $filter $unwind: with first stage $project, mongodb is doing a collection scan, outputting ALL documents in that collection with that field. Now, It’s a full collection scan and slow

    • $match $project $filter $unwind: with first stage $match, mongodb is doing an index scan instead of a collection if It hit an index. After that, stage $project will do on a limit document that matched. Now, it a index scan and faster because mongodb only do on some document.

    Final, with your 2 option, the second is better

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search