skip to Main Content

I had setup a working Docker Swarm cluster, but after several months I am trying to get back to using this cluster and I noticed nothing works.

Upon troubleshooting to find out what is going on, I found this error.

 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: error
  NodeID: 
  Error: error while loading TLS certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: certificate (1 - s3htdkgcv9qifg2jmbpud1gt7) not valid after Sun, 27 Mar 2022 10:27:00 UTC, and it is currently Sun, 19 Jun 2022 04:33:54 UTC: x509: certificate has expired or is not yet valid: 
  Is Manager: false
  Node Address: 10.10.1.10

I have tried what I found online like here https://stackoverflow.com/a/59086699/5442187

docker swarm leave

and then tried to rejoin

docker swarm join-token manager

=>

Error response from daemon: This node is not a swarm manager. Use
"docker swarm init" or "docker swarm join" to connect this node to
swarm and try again.

And

docker swarm join-token worker

=>

Error response from daemon: This node is not a swarm manager. Use
"docker swarm init" or "docker swarm join" to connect this node to
swarm and try again.

How do I re-join/re-claim this cluster back? I will expect it should be possible else this will make Docker Swarm a no go for production.

3

Answers


  1. Rotate the swarm CA via docker swarm ca --rotate.

    The root CA rotation will not be completed until all registered nodes have rotated their TLS certificates. If the rotation is not completing within a reasonable amount of time, try running docker node ls --format '{{.ID}} {{.Hostname}} {{.Status}} {{.TLSStatus}}' to see if any nodes are down or otherwise unable to rotate TLS certificates.

    See https://docs.docker.com/engine/reference/commandline/swarm_ca/

    Login or Signup to reply.
  2. there were just 2 nodes in the cluster and all of them it says manager false, commands are ran on both nodes and none of them works

    Once all managers have left the cluster, I believe it is gone. Before then you could have run the following on one of the managers:

    docker swarm init --force-new-cluster
    

    Now that they’ve all left, you can recreate the cluster from scratch:

    # on the manager
    docker swarm init
    

    Once you have a new cluster, on the manager run:

    docker swarm join-token manager # or worker
    

    Then run the output of the join-token command above on the other nodes to join to the cluster.

    Login or Signup to reply.
  3. There’s a way to recover, without losing the deployed swarm services/stacks.
    The error was complaining "certificate not valid after Sun, 27 Mar 2022 10:27:00 UTC". So we should let the certificate valid first, then recover the swarm services, and rotate the CA certificate when swarm is up and running:

    1. stop docker service:

      service docker stop

    2. set date back to "27 Mar 2022 10:27:00", could be more earlier:

      date -s "27 Mar 2022 10:27:00"

    3. Bring up the swarm services:

      service docker start

      #check if all the services are up and running

      docker stack ls

    4. Rotate the certificate:

      docker swarm ca –rotate

    5. Set system date to current:

      date -s "19 Apr 2023 06:34:00"

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search