skip to Main Content

Let books collection be,

db.books.insertMany([
    { "name": "foo", "category": 0, publishedAt: ISODate("2008-09-14T00:00:00Z") },
    { "name": "bar", "category": 1, publishedAt: ISODate("1945-08-17T00:00:00Z") },
    { "name": "baz", "category": 1, publishedAt: ISODate("2002-03-01T00:00:00Z") },
    { "name": "qux", "category": 2, publishedAt: ISODate("2002-01-21T00:00:00Z") },
    { "name": "quux", "category": 4, publishedAt: ISODate("2018-04-18T00:00:00Z") },
])

I want to calculate total amount of books published between 2000-2010 inclusive for each year and also count of published categories. Let category be defined as an enum with 5 variants represented with integer in MongoDB schema e.g Fiction, Fantasy, Classic, Horror, Comic.

I achieved other requirements with this aggregation pipeline.

db.books.aggregate([
    {
        $match: {
            publishedAt: {
                $gte: ISODate("2000-01-01T00:00:00Z"),
                $lt: ISODate("2011-01-01T00:00:00Z"),
            },
        },
    },
    {
        $group: {
            _id: {
                $year: "$publishedAt",
            },
            totalCount: {
                $count: {},
            },
        },
    },
    {
        $sort: {
            _id: 1,
        },
    },
]);

With following output,

[
    {
        _id: 2002,
        totalCount: 2,
    },
    {
        _id: 2008,
        totalCount: 1,
    },
]

But I also want a field that represents number of categories in an array. For example,

[
    {
        _id: 2002,
        totalCount: 2,
        categoryCount: [0, 1, 1, 0, 0],
    },
    {
        _id: 2008,
        totalCount: 1,
        categoryCount: [1, 0, 0, 0, 0],
    },
]

Array’s length needs to be 5 since category is defined with 5 variants. In the example, the year 2002 has total of 2 books, which totalCount represents and has 1 book in category 1 which is why categoryCount[1] is 1. Likewise 1 book in category 2.

2

Answers


  1. Using $accumulate

    db.collection.aggregate([
    {
        $match: {
          publishedAt: {
            $gte: ISODate("2000-01-01T00:00:00Z"), $lt: ISODate("2011-01-01T00:00:00Z")
          }
        }
    },
    {
        $group: {
          _id: {
            $year: "$publishedAt"
        },
          totalCount: {
            $count: {}
        },
          categoryCount: {
            $accumulator: {
              init: function(){
                return [0,0,0,0,0]                      //start with array with all entries as 0
              },
              accumulate: function(state, category) {
                state[category] = state[category] + 1;  //increment the value at index specified by the category
                return state;
              },
              accumulateArgs: ["$category"],
              merge: function(state1, state2) {
                for (i = 0; i < state.length; i++)      //incase the merge is needed add the values at each indexes
                {
                  state[i] = state1[i] + state2[i];
                }
                return state;
              },
              lang: "js"
            }
          }
        },
    },
    {
        $sort: {
          _id: 1
        }
    }
    ]);
    
    Login or Signup to reply.
  2. You can achieve results like that without accumulator, using two $group stages: first by year and category, and then by year only, and then apply some MongoDB functions to transform the result to the desired format

    The resulting query is long and looks quite complicated, duh. But works on your data example:

    db.collection.aggregate([
      {
        $match: {
          publishedAt: {
            $gte: ISODate("2000-01-01T00:00:00Z"),
            $lt: ISODate("2011-01-01T00:00:00Z")
          }
        }
      },
      {
        $group: {
          _id: {
            year: {
              $year: "$publishedAt"
            },
            category: "$category"
          },
          totalCount: {
            $count: {}
          }
        }
      },
      {
        $group: {
          "_id": "$_id.year",
          "totalCount": {
            "$sum": "$totalCount"
          },
          "categoryCount": {
            "$push": {
              "k": {
                "$toString": "$_id.category"
              },
              "v": "$totalCount"
            }
          }
        }
      },
      {
        "$addFields": {
          "categoryCount": {
            "$arrayToObject": "$categoryCount"
          }
        }
      },
      {
        "$addFields": {
          "categoryCount": {
            "$mergeObjects": [
              {
                "0": 0,
                "1": 0,
                "2": 0,
                "3": 0,
                "4": 0,
                "5": 0
              },
              "$categoryCount"
            ]
          }
        }
      },
      {
        "$addFields": {
          "categoryCount": {
            "$objectToArray": "$categoryCount"
          }
        }
      },
      {
        "$addFields": {
          "categoryCount": {
            "$map": {
              "input": "$categoryCount",
              "as": "x",
              "in": {
                "$mergeObjects": [
                  "$$x",
                  {
                    "k": {
                      "$toInt": "$$x.k"
                    }
                  }
                ]
              }
            }
          }
        }
      },
      {
        "$addFields": {
          "categoryCount": {
            "$sortArray": {
              "input": "$categoryCount",
              "sortBy": {
                "$k": 1
              }
            }
          }
        }
      },
      {
        "$addFields": {
          "categoryCount": "$categoryCount.v"
        }
      },
      {
        $sort: {
          _id: 1
        }
      }
    ])
    

    MongoDB playground

    Step-by-step explanation:

    1. $match – your initial filter
    2. $group – pass both year and category into _id to preserve the count for each category
    3. $group – group by year only, collect a "categoryCount" as a list of objects for each category that appeared in this year
    4. $addFields – combine the list into a single document, keys are categories, and values are their counts. Notice, that keys can only be a strings, so we must cast them
    5. $addFields – "densify" object to fill missing categories with zeros
    6. $addFields – convert object back to the array, so we can extract values only
    7. $addFields – cast categories back to numbers for correct sorting, if you have more than 10 of them
    8. $addFields – sort by categories to ensure order (actually I’m not sure if this step is really needed)
    9. $addFields – extract the count for each category into a flat list

    Try to add these stages one by one to your query to see how it actually works.
    In fact, my suggestion is to use aggregation as an end-to-end transformation, but rather stop at stage 3 or 4, and finish the transformation with your programming language, if you can. Good luck

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search