A non-distributed database has many posts, posts have zero or more user-defined tags, most posts have the most_posts_have_this
tag, few posts have the few_posts_have_this
tag.
When querying {'tags': {'$all': ['most_posts_have_this', 'few_posts_have_this']}}
the query is slow, it seems to be iterating through posts with the most_posts_have_this
tag.
Is there some way to hint to MongoDB that it should be iterating through posts with the few_posts_have_this
tag instead?
2
Answers
Not really. When Mongo runs an
$all
it is going to get all records with both tags first. You could try using two$in
queries in an aggregation instead, selecting the less frequent tag first. I’m not sure if this would actually be faster (depends on how Mongo optimizes things) but could be worth a try.The best you can do:
Make sure you have an an index on the
tags
field. I see in the comments you have done this.Mongo may be using the wrong index for this query. You can see which it is using with
cursor.explain()
. You can force it to use yourtags
index withhint()
. First usedb.collection.getIndexes()
to make sure yourtags
index shows up as expected in the list of indexes.Using projections to return only the fields you need might speed things up. For example, depending on your use case, you might return just post IDs and then query full text for a smaller subset of the returned posts. This could speed things up because Mongo doesn’t have to manage as much intermediate data.
You could also consider periodically sorting the
tags
array field by frequency. If the least frequent tags are first, Mongo may be able to skip further scanning for that document. It will still fetch all the matching documents, but if your tag lists are very large it could save time by skipping the later tags. See The ESR (Equality, Sort, Range) Rule for more details on optimizing your indexed fields.If all that’s still not fast enough and the performance of these queries is critical, you’ll need to do something more drastic:
See Mongo’s optimizing query performance for more detail, but again, it is unlikely to help with this use case.
Short answer is no, this is due to how Mongo builds an index on an array:
So when you when you query the tags field imagine mongo queries each tag separately then it does an intersection.
If you run "explain" you will be able to see that after the index scan phase Mongo executes a fetch document phase, this phase in theory should be redundant for an pure index scan which shows this is not the case. So basically Mongo fetches ALL documents that have either of the tags, only then it performs the "$all" logic in the filtering phase.
So what can you do?
Otherwise you will have to reconsider a restructuring that will allow better index usage for this usecase, I will say for most access patterns your structure is better.
The new structure can be an object like so:
Now if you built an index on
tags2
each key of the object will be indexed separately, this will make mongo skip the "fetch" phase as the index contains all the information needed to execute the following query:I understand both solutions are underwhelming to say the least, but sadly Mongo does not excel in this specific use case.. I can think of more "hacky" approaches but I would say these 2 are the more reasonable ones to actually consider implementing depending on performance requirments.