I would like to get the result in mongodb similar to partitioning in postgresql. That is, I want that on one instance of the database, without a cluster, without additional nodes, the data is distributed, for example, by date, and when I request data in the collection, then mongodb would scan in parallel only those sub-collections that meet the condition.
For example, a collection of type
{
date: ISODate('2024-03-03T05:05:05')
temperature: 25
}
And When I will make a request of type
db.my_collection.find({date: { $gt: ISODate('2024-03-03') } })
Then only sub-collections that store data newer than 2024-03-03 would be scanned in parallel.
This is how partitioning works in PostgreSQL.
I repeat that I am not interested in the topic of sharding and load distribution on different servers, I have only one server with a very large amount of data.
2
Answers
Please checkout MongoDB Indexes.
Indexes are special data structures that store a small portion of the collection’s data set in an easy-to-traverse form. MongoDB indexes use a B-tree data structure.
Full Documentation: MongoDB Indexes
In your case you need to add index on date field in your collection to limit the documents for scanning.
MongoDB doesn’t have Partitioning like that. That could be done with Sharding but since you don’t want to do that, your only option is to create collections and move data periodically.
But the collections must have different names and you can’t do
db.my_collection.find()
to automatically select the right collection. That’s what sharding would do. You’ll need aget_collection_for_date('2024-03-03').find(...)
.And then you’ll need to handle date ranges for queries which apply across multiple collections and combine results from multiple queries to different collections. Maybe even construct the multiple queries using
unionWith
. None of this can be done transparently.If your concern is about the number of documents, time to retrive documents:
Use Indexing for fields frequently used in Find/Match queries, like your
date
.‼ Instead of having one document per reading/temperature, each document should have the readings for a specific time-range. See Group Data with the Bucket Pattern.
Use an actual Time Series Collection.