Here is my mongo cluster (sharding with replicaset) configuration.
replica sets:
rs0 - IP1, IP2, IP3 || port - 27017
rs1 - IP4, IP5, IP6 || port - 27017
config server replica set - IP7, IP8, IP9 || port - 26017
mongos - IP7, IP8, IP9 || port - 26000
This is a test setup and the configuration was setup using IPs(not hostnames). Unfortunately, all host were down following a maintenance & all host IPs changed when we brought the nodes up. Obviously replica set(mongod), config server(mongod) and mongos didn’t come up due to unreachable IP addresses.
To bring up the setup, I did the following
- Updated replica set host IP addresses following https://www.mongodb.com/docs/v4.2/tutorial/change-hostnames-in-a-replica-set/
- Updated config server replica set host IPs following the same mongo document. Started mongod services w/o sharding.
- Didn’t find any proper documentation around changing config server & mongos IP address/hostname change. On config server replica set, updated "shards" collection in config db.
cfg1 = db.shards.findOne( { "_id": "rs0" } )
cfg1.host = "rs0/new_IP1:27017,new_IP2:27017,new_IP3:27017"
db.shards.update({ "_id" : "rs0" } , cfg1 )
cfg2 = db.shards.findOne( { "_id": "rs1" } )
cfg2.host = "rs1/new_IP3:27017,new_IP4:27017,new_IP5:27017"
db.shards.update({ "_id" : "rs1" } , cfg2 )
- Started config server and mongos properly.
- Now restarting replicaset members to make use of shading. However the replica set mongod processes are not starting citing references to old config server replica set IPs. Following error I am getting on mongod.log.
2022-05-17T21:20:39.654+0530 W SHARDING [initandlisten] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set csrs
2022-05-17T21:20:40.154+0530 I ASIO [ReplicaSetMonitor-TaskExecutor] Connecting to x.x.x.x:26017
2022-05-17T21:20:41.655+0530 I ASIO [ReplicaSetMonitor-TaskExecutor] Connecting to y.y.y.y:26017
2022-05-17T21:20:42.660+0530 I ASIO [ReplicaSetMonitor-TaskExecutor] Failed to connect to z.z.z.z:26017 - HostUnreachable: Error connecting to 10.0.13.206:26017 :: caused by :: No route to host
I couldn’t find any help on web to recover from this scenario. Requesting assistance in recovering the setup without loosing any data as we have loaded TBs of data on this cluster.
2
Answers
The issue is solved now.
The final piece of puzzle was to find where was the config server connection info saved in replica set mongod. It's in
system.version
collection under admin db. I had to follow the following stepssystem.version
had the config server connection string.db.system.version.find( {"_id" : { $in : [ "shardIdentity" , "minOpTimeRecovery" ]} })
db.system.version.update
command.Note : I am new to mongo and not sure if we should be making changes to internal system collections. Since it was a test setup, I took the risk and did these experiments which paid off. Its not recommended on a production environment a resolution can't be guaranteed.
I run this procedure as test on my local machine. It seems to work, but I cannot guarantee anything.
mongod/mongos
services on all nodesmongod Config ReplicaSet
local
databaseconfig.shards
dbPath
of all other config serversExample (Windows style):
mongod Shard ReplicaSet
Repeat below for each shard
local
databaseadmin.system.version
dbPath
of all other shard serversExample (Windows style):
mongos Router
This one the the simplest part.
sharging.configDB
stringmongos