skip to Main Content

I have an aggregation pipeline in which I’m trying to find the unique values within an array property and also the number of instances of each value within each document’s array.

I arrive at a stage where the documents look like this:

{
    _id: ObjectId("5d8cac657d2d1e0145268fb4"),
    values: [
        "5d8a2c3d4768d9660d3ba383",
        "5d8a1d4f4768d951cb6c8989"
    ]
},
{
    _id: ObjectId("5d8caf4039321e7b14061a46"),
    values: [
        "5d8a2c3d4768d9660d3ba383",
        "5d8c835a39321e7b15792353"
    ]
}

and from here I’d like to end up with a result like the following:

{
    values: {
        "5d8a2c3d4768d9660d3ba383": 2,
        "5d8a1d4f4768d951cb6c8989": 1,
        "5d8c835a39321e7b15792353" : 1
    }  
}

or 

{
    values: [
        { "id": "5d8a2c3d4768d9660d3ba383", "count" : 2 },
        { "id": "5d8a1d4f4768d951cb6c8989", "count" : 1 },
        { "id": "5d8c835a39321e7b15792353", "count" : 1 }
    ]
}

I’m not too concerned with the format of the result, as long as I have the unique values and their counts I can work with it — whatever the most performant approach is is fine. I can get the unique values by using a $group stage with $push and $reduce but I’m not able to also get the counts that way — could someone point me in the right direction?

2

Answers


  1. If you want the first option you can try this query:

    • First $unwind the array values to be able to get individualy each element.
    • Then $group to get the count for each element.
    • And the re-grouping again but using _id as null to get all values into one single array. Every element here will be an object with k and v attributes. That will be used in the next stage.
    • Last stage is $project to not output the _id value (wich in null) and to use $arrayToObject. Using k and v in the previous stage it created an array with objects valid to be converted into the desired object.
    db.collection.aggregate([
      {
        "$unwind": "$values"
      },
      {
        "$group": {
          "_id": "$values",
          "count": {
            "$sum": 1
          }
        }
      },
      {
        "$group": {
          "_id": null,
          "values": {
            "$push": {
              "k": "$_id",
              "v": "$count"
            }
          }
        }
      },
      {
        "$project": {
          "_id": 0,
          "values": {
            "$arrayToObject": "$values"
          }
        }
      }
    ])
    

    Example here

    And for the second option you can try this query:

    • The same steps as before.
    • The re-grouping here is using $$ROOT to get the entire object. As here is not needed anything else you can set the object as it is.
    • Last stage is $project only to not output the _id value.
    db.collection.aggregate([
      {
        "$unwind": "$values"
      },
      {
        "$group": {
          "_id": "$values",
          "count": {
            "$sum": 1
          }
        }
      },
      {
        "$group": {
          "_id": null,
          "values": {
            "$push": "$$ROOT"
          }
        }
      },
      {
        "$project": {
          "_id": 0
        }
      }
    ])
    

    Example here

    Edit: I didn’t notice te second example has id and not _id. If it is relevant you can use this query instead which is the same but instead of using $$ROOT to set the entire object at it is, it create the id/count object.

    Login or Signup to reply.
  2. You could use $facet as it allows you to:

    Processes multiple aggregation pipelines within a single stage on the same set of input documents.

    While not an overly complex query it means you can $unwind and $group in the same $facet with the output of the sub-pipelines added to a values array like so:

    db.collection.aggregate([
      {
        $facet: {
          values: [
            {
              $unwind: "$values"
            },
            {
              $group: {
                _id: "$values",
                count: {
                  $sum: 1
                }
              }
            }
          ]
        }
      }
    ])
    

    See HERE for a working example.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search