I am new to Kubernetes and want to ask two questions that may sound really basic.
- I understand that node controllers will monitor and respond to nodes’ status. They will evict pods from unhealthy nodes. However, will the Kubernetes cluster make any effort to actively try to recover the unhealthy nodes or it will only wait for the human to recover the nodes manually?
- If one container goes down in the cluster, will the cluster make any effort to actively try to recover it? I did some experiments in which I intentionally used the command "docker stop " to stop some of the containers (nginx-proxy, kube-controller-manager, kube-proxy). Those containers that I stopped didn’t seem to come back automatically. Does the result indicate that containers that goes down in the cluster won’t come back again until human starts to interfere?
Thank you in advance.
2
Answers
Kubernetes doesn’t support auto-healing of nodes. However, it varies with cloud provider as well. For example, GKE has a node auto-repair functionality that will monitor the node’s health status and trigger an automatic repair event (currently a node recreation for
NotReady
nodes).Kubernetes is a container orchestrator tool and it will try to start the pods if the pods are created as part of a K8s object(
Deployments
,Statefulsets
, etc.) that provides the auto-start policy if it fails due to some reason. If you create a standalonePod
resource, the pod will not be restarted upon completion. Also, since you are usingdocker
to run the containers, there is no Kubernetes object associated with it to handle its lifecycle so it would not start either as you observed.Basically answer is No!. Kubernetes cluster cannot make any effort to actively try to recover the unhealthy nodes. A node is a physical machine or virtual machine, now the responsibility of restarting a node or fixing a node when a node get unhealthy is on the admin.
kubectl cluster-info dump
: To get complete information of cluster overall healthkubectl get nodes
: To see the available nodes, you can verify that all of the nodes you expect to see are present and that they are all in the Ready state.When i container failed then it also causes the pod failure. When a pod failed then kubernetes will restart the pod immediately. If you see the pod yaml by
kubectl get pods <pod_name> -o yaml
this command, you will see there is a portion namedownerReferences
here the owener of the pod will be specified, basically they can beReplicaSet
orStatefulSet
etc, they basically restart the pod when a pod failed (to maintain the desired number of pods). As the pods get restarted so the container also starts again, thus kubernetes try to fix the container failure.