Let books
collection be,
db.books.insertMany([
{ "name": "foo", "category": 0, publishedAt: ISODate("2008-09-14T00:00:00Z") },
{ "name": "bar", "category": 1, publishedAt: ISODate("1945-08-17T00:00:00Z") },
{ "name": "baz", "category": 1, publishedAt: ISODate("2002-03-01T00:00:00Z") },
{ "name": "qux", "category": 2, publishedAt: ISODate("2002-01-21T00:00:00Z") },
{ "name": "quux", "category": 4, publishedAt: ISODate("2018-04-18T00:00:00Z") },
])
I want to calculate total amount of books published between 2000-2010 inclusive for each year and also count of published categories. Let category
be defined as an enum with 5 variants represented with integer in MongoDB schema e.g Fiction, Fantasy, Classic, Horror, Comic.
I achieved other requirements with this aggregation pipeline.
db.books.aggregate([
{
$match: {
publishedAt: {
$gte: ISODate("2000-01-01T00:00:00Z"),
$lt: ISODate("2011-01-01T00:00:00Z"),
},
},
},
{
$group: {
_id: {
$year: "$publishedAt",
},
totalCount: {
$count: {},
},
},
},
{
$sort: {
_id: 1,
},
},
]);
With following output,
[
{
_id: 2002,
totalCount: 2,
},
{
_id: 2008,
totalCount: 1,
},
]
But I also want a field that represents number of categories in an array. For example,
[
{
_id: 2002,
totalCount: 2,
categoryCount: [0, 1, 1, 0, 0],
},
{
_id: 2008,
totalCount: 1,
categoryCount: [1, 0, 0, 0, 0],
},
]
Array’s length needs to be 5 since category is defined with 5 variants. In the example, the year 2002 has total of 2 books, which totalCount
represents and has 1 book in category 1 which is why categoryCount[1]
is 1. Likewise 1 book in category 2.
2
Answers
Using $accumulate
You can achieve results like that without accumulator, using two $group stages: first by year and category, and then by year only, and then apply some MongoDB functions to transform the result to the desired format
The resulting query is long and looks quite complicated, duh. But works on your data example:
MongoDB playground
Step-by-step explanation:
$match
– your initial filter$group
– pass bothyear
andcategory
into _id to preserve the count for each category$group
– group byyear
only, collect a "categoryCount" as a list of objects for each category that appeared in this year$addFields
– combine the list into a single document, keys are categories, and values are their counts. Notice, that keys can only be a strings, so we must cast them$addFields
– "densify" object to fill missing categories with zeros$addFields
– convert object back to the array, so we can extract values only$addFields
– cast categories back to numbers for correct sorting, if you have more than 10 of them$addFields
– sort by categories to ensure order (actually I’m not sure if this step is really needed)$addFields
– extract the count for each category into a flat listTry to add these stages one by one to your query to see how it actually works.
In fact, my suggestion is to use aggregation as an end-to-end transformation, but rather stop at stage 3 or 4, and finish the transformation with your programming language, if you can. Good luck