skip to Main Content

Here is my mongo cluster (sharding with replicaset) configuration.

replica sets:
rs0 - IP1, IP2, IP3 || port - 27017
rs1 - IP4, IP5, IP6 || port - 27017

config server replica set - IP7, IP8, IP9 || port - 26017
mongos - IP7, IP8, IP9 || port - 26000

This is a test setup and the configuration was setup using IPs(not hostnames). Unfortunately, all host were down following a maintenance & all host IPs changed when we brought the nodes up. Obviously replica set(mongod), config server(mongod) and mongos didn’t come up due to unreachable IP addresses.

To bring up the setup, I did the following

  1. Updated replica set host IP addresses following https://www.mongodb.com/docs/v4.2/tutorial/change-hostnames-in-a-replica-set/
  2. Updated config server replica set host IPs following the same mongo document. Started mongod services w/o sharding.
  3. Didn’t find any proper documentation around changing config server & mongos IP address/hostname change. On config server replica set, updated "shards" collection in config db.
cfg1 = db.shards.findOne( { "_id": "rs0" } )
cfg1.host = "rs0/new_IP1:27017,new_IP2:27017,new_IP3:27017"
db.shards.update({ "_id" : "rs0" } , cfg1 )

cfg2 = db.shards.findOne( { "_id": "rs1" } )
cfg2.host = "rs1/new_IP3:27017,new_IP4:27017,new_IP5:27017"
db.shards.update({ "_id" : "rs1" } , cfg2 )
  1. Started config server and mongos properly.
  2. Now restarting replicaset members to make use of shading. However the replica set mongod processes are not starting citing references to old config server replica set IPs. Following error I am getting on mongod.log.
2022-05-17T21:20:39.654+0530 W SHARDING [initandlisten] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: Error loading clusterID :: caused by :: Could not find host matching read preference { mode: "nearest" } for set csrs
2022-05-17T21:20:40.154+0530 I ASIO     [ReplicaSetMonitor-TaskExecutor] Connecting to x.x.x.x:26017
2022-05-17T21:20:41.655+0530 I ASIO     [ReplicaSetMonitor-TaskExecutor] Connecting to y.y.y.y:26017
2022-05-17T21:20:42.660+0530 I ASIO     [ReplicaSetMonitor-TaskExecutor] Failed to connect to z.z.z.z:26017 - HostUnreachable: Error connecting to 10.0.13.206:26017 :: caused by :: No route to host

I couldn’t find any help on web to recover from this scenario. Requesting assistance in recovering the setup without loosing any data as we have loaded TBs of data on this cluster.

2

Answers


  1. Chosen as BEST ANSWER

    The issue is solved now.

    The final piece of puzzle was to find where was the config server connection info saved in replica set mongod. It's in system.version collection under admin db. I had to follow the following steps

    1. Start the mongod on all replicaset members with security authorization, replication and sharding disabled. Made necessary changed on config file.
    2. Under admin db, the following two documents in system.version had the config server connection string.

    db.system.version.find( {"_id" : { $in : [ "shardIdentity" , "minOpTimeRecovery" ]} })

    1. Updated both the documents with new config server connection string via db.system.version.update command.
    2. Shut down the mongod processes and enabled security authorization, replication and sharding in the mongod config file.
    3. Successfully started replica set mongod instances.

    Note : I am new to mongo and not sure if we should be making changes to internal system collections. Since it was a test setup, I took the risk and did these experiments which paid off. Its not recommended on a production environment a resolution can't be guaranteed.


  2. I run this procedure as test on my local machine. It seems to work, but I cannot guarantee anything.

    • Stop all mongod/mongos services on all nodes

    mongod Config ReplicaSet

    • Start one mongod config server in maintenance mode
    • Drop local database
    • Update config.shards
    • Shutdown mongod
    • Delete dbPath of all other config servers
    • Start all mongod config servers
    • Connect to first mongod config server
    • Initiate ReplicaSet

    Example (Windows style):

    SET MAINTENANCE_LOG=--logpath C:MongoDBlogmongo_maintenance.log --logappend
    SET MAINTENANCE_NET=--bind_ip localhost --port 55555
    SET MAINTENANCE_MISC=--setParameter skipShardingConfigurationChecks=true --setParameter disableLogicalSessionCacheRefresh=true
    
    
    start mongod --dbpath C:MongoDBdatamongocfg_1 %MAINTENANCE_LOG% %MAINTENANCE_MISC% %MAINTENANCE_NET%
    mongo --norc localhost:55555/admin 
    db.getSiblingDB('local').dropDatabase()
    db.getSiblingDB('config').getCollection("shards").updateOne(
       {_id : "shard_01"}, 
       {$set: {host: "shard_01/<new_IP:port>,<new_IP:port>" }}
    )
    db.getSiblingDB('config').getCollection("shards").updateOne(
       {_id : "shard_02"}, 
       {$set: {host: "shard_02/<new_IP:port>,<new_IP:port>" }}
    )
    db.getSiblingDB('config').getCollection("shards").updateOne(
       {_id : "shard_03"}, 
       {$set: {host: "shard_03/<new_IP:port>,<new_IP:port>" }}
    )
    db.getSiblingDB('admin').shutdownServer()
    exit
    
    rmdir C:MongoDBdatamongocfg_2
    rmdir C:MongoDBdatamongocfg_3
    
    net start MongoDB_Config_1
    net start MongoDB_Config_2
    net start MongoDB_Config_3
    
    mongo "mongodb://user:password@localhost:27029/admin?authSource=admin"
    rs.initiate(
      {
        _id: "configRepSet",
        configsvr: true,
        members: [
          { _id: 0, host: "<new_IP:port>", priority: 10 },
          { _id: 1, host: "<new_IP:port>", priority: 5 },
          { _id: 2, host: "<new_IP:port>", priority: 5 }
        ]
      }
    )
    rs.status()
    while (! db.hello().isWritablePrimary ) { sleep(1000) }
    exit
    

    mongod Shard ReplicaSet

    Repeat below for each shard

    • Start one mongod shard server (preferable the former PRIMARY) in maintenance mode
    • Drop local database
    • Update admin.system.version
    • Shutdown mongod
    • Delete dbPath of all other shard servers
    • Start all mongod shard servers
    • Connect to first mongod shard server
    • Initiate ReplicaSet

    Example (Windows style):

    SET MAINTENANCE_LOG=--logpath C:MongoDBlogmongo_maintenance.log --logappend
    SET MAINTENANCE_NET=--bind_ip localhost --port 55555
    SET MAINTENANCE_MISC=--setParameter skipShardingConfigurationChecks=true --setParameter disableLogicalSessionCacheRefresh=true
    
    start mongod --dbpath C:MongoDBdatamongoshard_1prim %MAINTENANCE_LOG% %MAINTENANCE_MISC% %MAINTENANCE_NET%
    mongo --norc localhost:55555/admin 
    db.getSiblingDB('local').dropDatabase()
    db.getSiblingDB('admin').getCollection("system.version").updateOne(
       {_id : "shardIdentity"}, 
       {$set: { configsvrConnectionString: "configRepSet/<new_IP:port>,<new_IP:port>,<new_IP:port>" }}
    )
    db.getSiblingDB('admin').shutdownServer()
    exit
    
    rmdir C:MongoDBdatamongoshard_1sec*
    rmdir C:MongoDBdatamongoshard_1arb*
    
    net start MongoDB_Shard_1prim
    net start MongoDB_Shard_1sec
    net start MongoDB_Shard_1arb
    
    
    mongo "mongodb://user:password@localhost:37028/admin?authSource=admin"
    rs.initiate(
      {
        _id: "shard_01",
        members: [
          { _id: 0, host: "<new_IP:port>", priority: 10 },
          { _id: 1, host: "<new_IP:port>", priority: 5 },
          { _id: 2, host: "<new_IP:port>", arbiterOnly: true }
        ]
      }
    )
    rs.status()
    while (! db.hello().isWritablePrimary ) { sleep(1000) }
    exit
    

    mongos Router

    This one the the simplest part.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search