skip to Main Content

I want to insert massive amount of data ( 15TB ) in MongoDB.
I don’t have lots of queries so i skipped setting Replica.

Here is my main question: Is it any good in clustering MongoDB on a docker with shared resources instead of just using it standalone? ( like better use of CPU or Disk )

And, Would sharding MongoDB would help me in each of dockerized-cluster or standalone modes?

I have loaded 1TB of same data in MongoDB standalone and it was fine for me.

Please write a Comment if you have any other advise.

2

Answers


  1. If you’re referring to a "Sharded cluster" The idea behind those is to improve performance by spreading the data around on different nodes, Wether or not this is the "right choice" for you depends on your data, the shard key distribution and your performance needs.

    It’s just a trade off –
    paying for more resources in order to increase performance.

    I will say 15TB (if it’s all on 1 collection) probably warrants some sort of data distribution, without sharding you’d probably need very large disks with high IOPS just to support simple insertions and queries.

    Login or Signup to reply.
  2. standalone (single mongod) is a bad option and not recommended anywhere other than for some local development tests.
    There are few reasons:

    1. Enough features like sessions, transactions, retry writes and more are not supported for standalone cluster.
    2. Replica set allows you to have flexible read configuration that may increase your performance.
    3. One of the main reasons is a fail over. If you standalone server (mongod) is not available, the whole app can’t work where with repliset configuration, you will have a new elected primary to work with soon.

    Would sharding MongoDB would help me in each of dockerized-cluster or standalone modes?

    sharding is also should be based on replicaset for the above reasons. The only difference that when you have sharding enabled, you will communicate with server(s) not directly, but via a dispatcher (mongos) which will be responsible for redirect your queries to particular clusters, but problems on the cluster level will remain the same as I described above.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search