I want to insert massive amount of data ( 15TB ) in MongoDB.
I don’t have lots of queries so i skipped setting Replica.
Here is my main question: Is it any good in clustering MongoDB on a docker with shared resources instead of just using it standalone? ( like better use of CPU or Disk )
And, Would sharding MongoDB would help me in each of dockerized-cluster or standalone modes?
I have loaded 1TB of same data in MongoDB standalone and it was fine for me.
Please write a Comment if you have any other advise.
2
Answers
If you’re referring to a "Sharded cluster" The idea behind those is to improve performance by spreading the data around on different nodes, Wether or not this is the "right choice" for you depends on your data, the shard key distribution and your performance needs.
It’s just a trade off –
paying for more resources in order to increase performance.
I will say 15TB (if it’s all on 1 collection) probably warrants some sort of data distribution, without sharding you’d probably need very large disks with high IOPS just to support simple insertions and queries.
standalone
(single mongod) is a bad option and not recommended anywhere other than for some local development tests.There are few reasons:
sharding
is also should be based on replicaset for the above reasons. The only difference that when you have sharding enabled, you will communicate with server(s) not directly, but via a dispatcher (mongos) which will be responsible for redirect your queries to particular clusters, but problems on the cluster level will remain the same as I described above.