We have a 6 node cassandra 3.11.3 cluster with ubuntu 16.04. These are virtual machines.
We are switching to physical machines on brand (8!) new servers that will have debian 11 and presumably cassandra 3.11.12.
Since the main version is always 3.11.x and ubuntu 16.04 is out of support, the question is: can we just let the new machines join the old cluster and then decommission the outdated?
I hope to get a tips about this becouse intuitively it seems fine but we are not too sure about that.
Thank you.
2
Answers
Quick tip here; but it’s a good idea to build your clusters in multiples of your RF. Not sure what your RF is, but if RF=3, I’d either stay with six or get one more and go to nine. It’s all about even data distribution.
In short, no. You’ll want to upgrade the existing nodes to 3.11.12, first. I can’t recall if 3.11.3 and 3.11.12 are SSTable compatible, but I wouldn’t risk it.
Secondly, the best way to do this, is to build your new (physical) nodes in the cluster as their own logical data center. Start them up empty, and then run a
nodetool rebuild
on each. Once that’s complete, then decommission the old nodes.There is a bit simpler solution – move data from each virtual machine into a physical server, as following:
1.Stop Cassandra in a virtual machine & make sure that it won’t start
/var/lib/cassandra
or something like from VM to the physical serverRepeat that process for all VM nodes, at some point, updating seeds, etc. After process is finished, you can add two physical servers that are left. Also, to speedup process, you can do initial copy of the data before stopping Cassandra in the VM, and after it’s stopped, re-sync data with
rsync
or something like. This way you can minimize the downtime.This approach would be much faster compared to the adding a new node & decommissioning the old one as we won’t need to stream data twice. This works because after node is initialized, Cassandra identify nodes by assigned UUID, not by IP address.
Another approach is to follow instructions on replacement of the dead node. In this case streaming of data will happen only once, but it could be a bit slower compared to the direct copy of the data.