I’m running an AWS EKS Cluster with a Node group consisting of 3 t3.large instances. The cluster is on version 1.25. The nodes are on AMI 1.23.9-20220926.
When updating the AMI to 1.25.16-20240514 it fails with Error code "NodeCreationFailure" and message "Couldn’t proceed with upgrade process as new nodes are not joining node group my-ng".
During the update 2 new nodes are started.
Executing
sudo tail -f /var/log/messages
on the new node show the following error:
May 27 07:05:13 ip-10-1-23-206 kubelet: I0527 07:05:13.320566 2966 prober.go:114] "Probe failed" probeType="Liveness" pod="kube-system/aws-node-tq86p" podUID=ae280721-6920-4b2e-a726-4505b73cb3cb containerName="aws-node" probeResult=failure output=<
May 27 07:05:13 ip-10-1-23-206 kubelet: {"level":"info","ts":"2024-05-27T07:05:13.316Z","caller":"/usr/local/go/src/runtime/proc.go:250","msg":"timeout: failed to connect service ":50051" within 5s"}
May 27 07:05:13 ip-10-1-23-206 kubelet: >
May 27 07:05:14 ip-10-1-23-206 kubelet: E0527 07:05:14.000518 2966 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="kube-system/ebs-csi-node-shpss" podUID=fdb00d19-802f-452c-8c22-d45200c9a27d
My Amazon VPC CNI Add-on is in status active with version v1.18.1-eksbuild.3, Amazon EBS CSI Driver is also active with version v.1.30.0-eksbuild.1
The newly created nodes disappear after some minutes, the node group remains on the old AMI. The update status show an error
NodeCreationFailurev Couldn't proceed with upgrade process as new nodes are not joining node group my-ng
Any help is highly appreciated.
I’m expecting the AMI update to succeed.
2
Answers
There are multiple reasons why nodes failing to join the cluster. As a start check the kubelet logs on the failed to join worker node for exact reason.
You can refer the aws document for complete troubleshooting steps.
[+]https://repost.aws/knowledge-center/eks-worker-nodes-cluster
Note: Make sure that kube-proxy addon is also on recommended version.
[+] https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html
I faced the same issue with the exact same node and EKS versions while updating the EKS cluster version and node version. After checking logs I noticed the error "cni config uninitialized", which led me to the solution.
To solve this you need to add the following networking add-ons in the AWS EKS management console:
Ensure to set the "conflict resolution method" to "override" while installing the add-ons. If not set, the installation will fail.
After installing these add-ons, my node and cluster updates went smoothly.
Hope this helps you out! Cheers!