rancher rke up errors on etcd host health checks remote error: tls: bad certificate - Debian

arnittocrab
March 24, 2022
175 views
0 votes
2 Answers

rke --debug up --config cluster.yml

fails with health checks on etcd hosts with error:

DEBU[0281] [etcd] failed to check health for etcd host [x.x.x.x]: failed to get /health for host [x.x.x.x]: Get "https://x.x.x.x:2379/health": remote error: tls: bad certificate

Checking etcd healthchecks

for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5"); do
  echo "Validating connection to ${endpoint}/health";
  curl -w "n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health";
done

Running on that master node
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}

you can run it manually and see if it responds correctly
curl -w "n" --cacert /etc/kubernetes/ssl/kube-ca.pem --cert /etc/kubernetes/ssl/kube-etcd-x-x-x-x.pem --key /etc/kubernetes/ssl/kube-etcd-x-x-x-x-key.pem https://x.x.x.x:2379/health

Checking my self signed certificates hashes

# md5sum /etc/kubernetes/ssl/kube-ca.pem
f5b358e771f8ae8495c703d09578eb3b  /etc/kubernetes/ssl/kube-ca.pem

# for key in $(cat /home/kube/cluster.rkestate | jq -r '.desiredState.certificatesBundle | keys[]'); do echo $(cat /home/kube/cluster.rkestate | jq -r --arg key $key '.desiredState.certificatesBundle[$key].certificatePEM' | sed '$ d' | md5sum) $key; done | grep kube-ca
f5b358e771f8ae8495c703d09578eb3b - kube-ca

versions on my master node
Debian GNU/Linux 10
rke version v1.3.1
docker version Version: 20.10.8
kubectl v1.21.5
v1.21.5-rancher1-1

I think my cluster.rkestate gone bad, are there any other locations where rke tool checks for certificates?
Currently I cannot do anything with this production cluster, and want to avoid downtime. I experimented on testing cluster different scenarios, I could do as last resort to recreate the cluster from scratch, but maybe I can still fix it…
rke remove && rke up

Answers

Chosen as BEST ANSWER
- arnittocrab
- March 25, 2022 at 4:15 pm
- 0 votes
0
rke util get-state-file helped me to reconstruct bad cluster.rkestate file and I was able to successfully rke up and add new master node to fix whole situation.

(Edit)

- MostafaGhadimi
- January 28, 2023 at 9:32 am
- 0 votes
0
The problem can be solved by doing the following steps:
1. Remove kube_config_cluster.yml file where you run rke up command. (Since some data are missing in your K8s nodes)
2. Remove cluster.rkestate file.
3. Re-run rke up command.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

rancher rke up errors on etcd host health checks remote error: tls: bad certificate – Debian

Answers