I have a MongoDB collection with 8k+ documents, around 40GB. Inside it, the data follows this format:
{
_id: ...,
_session: {
_id: ...
},
data: {...}
}
I need to get all the _session._id for my application. The following approach (python) takes too long to get them:
cursor = collection.find({}, projection={'_session._id': 1})
I have created an Index in MongoDB Compass, but I’m not sure if my query is making use of it at all.
Is there a way to speed this query such that I get all the _session._id
very fast?
2
Answers
In mongo shell you can hint() the query optimizer to use the available index as follow:
Following test is confirmed to work via python:
db.test2.createIndex({"x.y":1})
python 3.7 ,
pymongo 3.11.2 ,
mongod 5.0.5
In your case seems to be text index , btw it seems abit strange why session is text index , for text index somethink like this must work:
And here is working example with text index:
db.test2.createIndex({"x.y":"text"}):
There are a few points of confusion in this question and the ensuing discussion which generally come down to:
hint
it failed)Current Indexes
I think there are at least 5 indexes that were mentioned so far:
{"_session._id":1}
mentioned originally in @R2D2’s answer._session._id
field (mentioned in this comment)_ts_meta.session
field (mentioned in this comment){"x.y":1}
mentioned second in @R2D2’s answer.{"x.y":"text"}
mentioned at the end of @R2D2’s answer.Only the first of these is likely to even really be relevant to the original question. Note that the difference a text index is a specialized index that is meant for performing more advanced text searching. Such indexes are not required for simple string matching or value retrieval. But standard indexes,
{ '_session._id': 1}
, will also store string values and are relevant here.What Indexing is For
Indexes are typically useful for retrieving a small subset of results from the database. The larger that set of results becomes relative to the overall size of the collection, the less helpful using an index will become. In your situation you are looking to retrieve data from all of the documents in the collection which is why the database doesn’t consider using any index at all.
Now it is still possible that an index could help in this situation. That would be if we used it to perform a covered query which means that the data can be retrieved from the index alone without looking at the documents themselves. In this case the database would have to scan the full index, so it is not clear that it would be faster or not. But you could certainly try. To do so you would need to follow @R2D2’s instructions, specifically by creating the index and then hinting it in the query (while also projecting out the
_id
field):Additional Questions
There were two other things mentioned in the question that are important to address.
We talked about why this was the case above. But to find out if the database is using it or not you could navigate to the Explain tab in compass to take a look. If you explain plan visualization it should indicate if the index was used. Remember that you will need to hint the index based on your query.
What is your definition of "very fast" here?
The general answer is that your operation requires scanning either all documents in the collection or a full index. There is no way to do this more efficiently based on the current schema. Therefore how fast it happens is largely going to come down to the hardware that the database is running on and it will slow down as the collection grows.
If this operation is something that you will be running frequently or have strict performance requirements around, then it may be important to think through your intended goals to see if there are other ways of achieving them. What will you or the application be doing with this list of session IDs?