How do I recover Docker Swarm cluster after certificate expired

uberrebu
June 19, 2022
268 views
3 votes
3 Answers

I had setup a working Docker Swarm cluster, but after several months I am trying to get back to using this cluster and I noticed nothing works.

Upon troubleshooting to find out what is going on, I found this error.

 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: error
  NodeID: 
  Error: error while loading TLS certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: certificate (1 - s3htdkgcv9qifg2jmbpud1gt7) not valid after Sun, 27 Mar 2022 10:27:00 UTC, and it is currently Sun, 19 Jun 2022 04:33:54 UTC: x509: certificate has expired or is not yet valid: 
  Is Manager: false
  Node Address: 10.10.1.10

I have tried what I found online like here https://stackoverflow.com/a/59086699/5442187

docker swarm leave

and then tried to rejoin

docker swarm join-token manager

Error response from daemon: This node is not a swarm manager. Use
"docker swarm init" or "docker swarm join" to connect this node to
swarm and try again.

And

docker swarm join-token worker

Error response from daemon: This node is not a swarm manager. Use
"docker swarm init" or "docker swarm join" to connect this node to
swarm and try again.

How do I re-join/re-claim this cluster back? I will expect it should be possible else this will make Docker Swarm a no go for production.

Answers

- DimaKorobskiy
- July 18, 2022 at 9:36 pm
- 0 votes
0
Rotate the swarm CA via docker swarm ca --rotate.

The root CA rotation will not be completed until all registered nodes have rotated their TLS certificates. If the rotation is not completing within a reasonable amount of time, try running docker node ls --format '{{.ID}} {{.Hostname}} {{.Status}} {{.TLSStatus}}' to see if any nodes are down or otherwise unable to rotate TLS certificates.

See https://docs.docker.com/engine/reference/commandline/swarm_ca/

Login or Signup to reply.

- BMitch
- July 19, 2022 at 3:09 pm
- 0 votes
0
there were just 2 nodes in the cluster and all of them it says manager false, commands are ran on both nodes and none of them works

Once all managers have left the cluster, I believe it is gone. Before then you could have run the following on one of the managers:
```
docker swarm init --force-new-cluster
```
Now that they’ve all left, you can recreate the cluster from scratch:
```
# on the manager
docker swarm init
```
Once you have a new cluster, on the manager run:
```
docker swarm join-token manager # or worker
```
Then run the output of the join-token command above on the other nodes to join to the cluster.
Login or Signup to reply.

- Qiushi
- April 19, 2023 at 12:36 am
- 0 votes
0
There’s a way to recover, without losing the deployed swarm services/stacks.
The error was complaining "certificate not valid after Sun, 27 Mar 2022 10:27:00 UTC". So we should let the certificate valid first, then recover the swarm services, and rotate the CA certificate when swarm is up and running:
1. stop docker service:
  
  service docker stop
2. set date back to "27 Mar 2022 10:27:00", could be more earlier:
  
  date -s "27 Mar 2022 10:27:00"
3. Bring up the swarm services:
  
  service docker start
  
  #check if all the services are up and running
  
  docker stack ls
4. Rotate the certificate:
  
  docker swarm ca –rotate
5. Set system date to current:
  
  date -s "19 Apr 2023 06:34:00"
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.