We have a stateful redis deployment on multi node Kubernetes cluster. (v1.27.15)
There are two services named "redis" and "redis-headless"
There are 3 nodes in the cluster. When we shutdown one of the nodes, redis on that node becomes terminating:
kubectl get pods -A -o wide | grep redis
mynamespace redis-node-0 3/3 Running 0 8m 10.244.248.4 ha3-node2
mynamespace redis-node-1 3/3 Terminating 0 68m 10.244.230.119 ha3-node1
mynamespace redis-node-2 3/3 Running 0 67m 10.244.192.208 ha3-node3
But for redis-headless service 10.244.230.119 is still in endpoints
kubectl describe endpoints -n mynamespace redis-headless
Name: redis-headless
Namespace: mynamespace
Subsets:
Addresses: 10.244.192.208,10.244.230.119,10.244.248.4
For redis service (clusterIP) endpoints are OK. (10.244.230.119 is deleted from endpoints)
Is this behaviour normal for headless service, if not what is the solution?
Regards,
Yavuz
2
Answers
This is working as intended, this is how Kubernetes works. The
pod deletion
and theendpoint
slice update processes are parallel and there is no guarantee that one will be updated before the other. Besides that, there are also all theingress/load balancer
backends that need to be updated with the new endpoint slice information, that also is not guaranteed to happen before the pod is stopped. This is the reason for our recommendation to use sleep in theprestop hook
; this should resolve the issue.If the
endpoint
is removed before the containers receive theterm signal
, no new requests will arrive while the containers areterminating
. If the containers startterminating
before the endpoint is removed, then the pod will continue to receive requests. Then those requests will get“Connection timeout” or “Connection refused”
errors as responses. Because the endpoint removal must propagate to every node in the cluster before it is complete, there is a high probability that the pod eviction process completes first.As per the learnk8s document by Daniele Polencic on Graceful shutdown in Kubernetes, which has detailed information.
The Headless service likely has
publishNotReadyAddresses
set totrue
in the service manifest. If this is true, then the IP of that terminating pod can still be shown in the endpoint resource until it is fully terminated. Once the controller recreates the pod, the new IP will show.After all, the Headless service does not have the kube-proxy handle it. The client can directly connect to the Pods via Cluster DNS – that is the whole point.
So it is normal in my opinion based on the above.