I have all combination of compound indexes for this collection. The aggregattion query i used is:
db.products.aggregate( [
{
$facet: {
"categorizedByColor": [
{
$match: {
size: { $in : [50,60,70] },
brand: { $in : ["Raymond","Allen Solly","Van Heusen"] }
}
},
{
$bucket: {
groupBy: "$color",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedBySize": [
{
$match: {
color: { $in : ["Red","Green","Blue"] },
brand: { $in : ["Raymond","Allen Solly","Van Heusen"] }
}
},
{
$bucket: {
groupBy: "$size",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedByBrand": [
{
$match: {
color: { $in : ["Red","Green","Blue"] },
size: { $in : [50,60,70] }
}
},
{
$bucket: {
groupBy: "$brand",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"productResults": [
{
$match: {
color: { $in : ["Red","Green","Blue"] },
size: { $in : [50,60,70] },
brand: { $in : ["Raymond","Allen Solly","Van Heusen"] }
}
}
]
}
}
]);
This query took around 6s to populate the results. Is there any alterative approach available to use mongodb indexing?
Note: This aggregation query have more than 14 facet pipelines. For better understanding i have provided only 4 facet pipelines.
2
Answers
Facet stage by default cannot use indexes and will perform COLLSCAN (full scan) when executed.
Because of that, you should use filtering (and sorting) way earlier in your pipeline, in order to get the "common data" for all the sub-pipelines in $facet.
So, in your case, filters :
should be used as a first stage in pipeline, then followed by $facet.
Hope I was clear enough. 🙂
Sometimes 14 queries can do the job and sometimes not.
If the
$facet
is the first step in the aggregation pipeline, 14 queries are a more efficient option, but if this$facet
is following a complex pipeline to create or filter these documents, there are alternatives to this$facet
‘s$match
. Sometimes One needs a snapshot of the db, which 14 queries can not give, since the db may change in-between.Since we don’t have any data of former actions in this pipeline, and the question is regarding alternatives that will allow to use the indexes to make the rest of the query faster, I can offer one option for example. It is hard to tell if it will be faster than other options, according to the data we have here, but it will allow to use the indexes, which is the main idea of the question:
$facet
‘s$match
much easier, by marking in advance which document belongs to which$facet
pipeline.Playground example
Going a step further, there is even a way to get these results in one query without the
$facet
step at all, by using$group
with$push
with$cond
instead. This should iterate over the documents once, instead of 14 times, but may result in a large document (with duplicates of data per each categorization). The main idea of such a solution can be seen on this mongoDB playground. It is important to say that these methods are not necessarily better or worse than other. The "right" solution depends on your specific case and data, which we can’t see here. You asked for alternative approaches which will allow to use the indexes, so I’m pointing some directions.