skip to Main Content

I would like to get the result in mongodb similar to partitioning in postgresql. That is, I want that on one instance of the database, without a cluster, without additional nodes, the data is distributed, for example, by date, and when I request data in the collection, then mongodb would scan in parallel only those sub-collections that meet the condition.

For example, a collection of type

{
 date: ISODate('2024-03-03T05:05:05')
 temperature: 25
}

And When I will make a request of type

db.my_collection.find({date: { $gt: ISODate('2024-03-03') } })

Then only sub-collections that store data newer than 2024-03-03 would be scanned in parallel.

This is how partitioning works in PostgreSQL.

I repeat that I am not interested in the topic of sharding and load distribution on different servers, I have only one server with a very large amount of data.

2

Answers


  1. Please checkout MongoDB Indexes.

    Indexes are special data structures that store a small portion of the collection’s data set in an easy-to-traverse form. MongoDB indexes use a B-tree data structure.

    Full Documentation: MongoDB Indexes

    In your case you need to add index on date field in your collection to limit the documents for scanning.

    Login or Signup to reply.
  2. MongoDB doesn’t have Partitioning like that. That could be done with Sharding but since you don’t want to do that, your only option is to create collections and move data periodically.

    But the collections must have different names and you can’t do db.my_collection.find() to automatically select the right collection. That’s what sharding would do. You’ll need a get_collection_for_date('2024-03-03').find(...).

    And then you’ll need to handle date ranges for queries which apply across multiple collections and combine results from multiple queries to different collections. Maybe even construct the multiple queries using unionWith. None of this can be done transparently.

    If your concern is about the number of documents, time to retrive documents:

    1. Use Indexing for fields frequently used in Find/Match queries, like your date.

    2. ‼ Instead of having one document per reading/temperature, each document should have the readings for a specific time-range. See Group Data with the Bucket Pattern.

      • So instead of having documents like
      {
        date: ISODate('2024-03-03T05:05:05'),
        temperature: 25
      }
      
      • The docs should have a range of readings:
      {
        from_date: ISODate("2024-03-03T05:00:00Z"),
        to_date: ISODate("2024-03-03T06:00:00Z"),
        readings: [
          { date: ISODate("2024-03-03T05:05:05Z"), temperature: 25 },
          { date: ISODate("2024-03-03T05:05:06Z"), temperature: 27 },
          ...
        ]
      }
      
    3. Use an actual Time Series Collection.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search