skip to Main Content

Let say I have a collection like this:

[
  {
    "_id": 1,
    "list": [
      {
        "a": 1
      },
      {
        "a": 2
      }
    ]
  },
  {
    "_id": 2,
    "list": [
      {
        "a": 3
      },
      {
        "a": 4
      }
    ]
  }
]

I have a few stages of a pipeline, some pre-processing, an unwind and a lookup, then post processing. I would like the lookup to be a self lookup on the current state of the collection rather than the original. This is the current aggregation pipeline that I have working:

[{ // pre-process with a project and maybe more
    "$project": {
      "_id": true,
      "list": {
        "$filter": {
          "input": "$list",
          "as": "lst",
          "cond": {"$in": ["$$lst.a", [1, 2, 4]]}
  }}}
  },{
    "$unwind": "$list"
  },{
    "$lookup": {
      // would like to reference the result of the above project/unwind here
      // instead of populating the pipeline argument
      "from": "collection", 
      "localField": "_id",
      "foreignField": "_id",
      "as": "results",
      "pipeline": [{
          "$project": {
            "_id": true,
            "list": {
              "$filter": {
                "input": "$list",
                "as": "lst",
                "cond": {"$in": ["$$lst.a", [1, 2, 4]]}
        }}}
        },{
          "$unwind": "$list"
        }
    ]}
  },{
    "$unwind": "$results"
  },
  ... // do some more filtering and projections
]

This has the desired output of comparing every subdocument in list with ever other subdocument in list with the same _id with the following output while filtering the undesired subdocuments:

[
  {
    "_id": 1,
    "list": {
      "a": 1
    },
    "results": {
      "_id": 1,
      "list": {
        "a": 1
      }
    }
  },
  {
    "_id": 1,
    "list": {
      "a": 1
    },
    "results": {
      "_id": 1,
      "list": {
        "a": 2
      }
    }
  },
  {
    "_id": 1,
    "list": {
      "a": 2
    },
    "results": {
      "_id": 1,
      "list": {
        "a": 1
      }
    }
  },
  {
    "_id": 1,
    "list": {
      "a": 2
    },
    "results": {
      "_id": 1,
      "list": {
        "a": 2
      }
    }
  },
  {
    "_id": 2,
    "list": {
      "a": 4
    },
    "results": {
      "_id": 2,
      "list": {
        "a": 4
      }
    }
  }
]

But this is very inefficient since mongo has to execute the copy and pasted pre-processing pipeline twice on the collection. How can I reference the current state of the pipeline to do a self lookup?

Ultimately what I want to do is do some calculations comparing each subdocument with ever other subdocument, the cartesian product on the list subdocuments (with the same _id).

2

Answers


  1. Use $facet:

    db.collection.aggregate([
       { 
          "$project": {
             "_id": true,
             "list": {
                "$filter": {
                   "input": "$list",
                   "as": "lst",
                   "cond": { "$in": ["$$lst.a", [1, 2, 4]] }
                }
             }
          }
       },
       {
          $facet: {
             current: [],
             unwind: [{ "$unwind": "$list" }]
          }
       }
    ])
    
    Login or Signup to reply.
  2. One option is to use $reduce and $map:

    db.collection.aggregate([
      {$project: {
          list: {$filter: {
              input: "$list",
              cond: {$in: ["$$this.a", [1, 2, 4]}
          }}
      }},
      {$project: {
          results: {$reduce: {
              input: "$list",
              initialValue: [],
              in: {$concatArrays: [
                  "$$value",
                  {$map: {
                      input: "$list",
                      as: "item",
                      in: {
                        _id: "$_id",
                        origA: "$$this.a",
                        list: {a: "$$item.a"}
                      }
                  }}
              ]}
          }}
      }},
      {$unwind: "$results"},
      {$project: {
          list: {a: "$results.origA"},
          results: {_id: "$results._id", list: "$results.list"}
      }}
    ])
    

    See how it works on the playground example

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search