skip to Main Content

I’m running an AWS EKS Cluster with a Node group consisting of 3 t3.large instances. The cluster is on version 1.25. The nodes are on AMI 1.23.9-20220926.

When updating the AMI to 1.25.16-20240514 it fails with Error code "NodeCreationFailure" and message "Couldn’t proceed with upgrade process as new nodes are not joining node group my-ng".

During the update 2 new nodes are started.

Executing

sudo tail -f /var/log/messages

on the new node show the following error:

May 27 07:05:13 ip-10-1-23-206 kubelet: I0527 07:05:13.320566    2966 prober.go:114] "Probe failed" probeType="Liveness" pod="kube-system/aws-node-tq86p" podUID=ae280721-6920-4b2e-a726-4505b73cb3cb containerName="aws-node" probeResult=failure output=<
May 27 07:05:13 ip-10-1-23-206 kubelet: {"level":"info","ts":"2024-05-27T07:05:13.316Z","caller":"/usr/local/go/src/runtime/proc.go:250","msg":"timeout: failed to connect service ":50051" within 5s"}
May 27 07:05:13 ip-10-1-23-206 kubelet: >
May 27 07:05:14 ip-10-1-23-206 kubelet: E0527 07:05:14.000518    2966 pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="kube-system/ebs-csi-node-shpss" podUID=fdb00d19-802f-452c-8c22-d45200c9a27d

My Amazon VPC CNI Add-on is in status active with version v1.18.1-eksbuild.3, Amazon EBS CSI Driver is also active with version v.1.30.0-eksbuild.1

The newly created nodes disappear after some minutes, the node group remains on the old AMI. The update status show an error

NodeCreationFailurev Couldn't proceed with upgrade process as new nodes are not joining node group my-ng

Any help is highly appreciated.

I’m expecting the AMI update to succeed.

2

Answers


  1. There are multiple reasons why nodes failing to join the cluster. As a start check the kubelet logs on the failed to join worker node for exact reason.

    journalctl -f -u kubelet
    

    You can refer the aws document for complete troubleshooting steps.
    [+]https://repost.aws/knowledge-center/eks-worker-nodes-cluster

    Note: Make sure that kube-proxy addon is also on recommended version.
    [+] https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html

    Login or Signup to reply.
  2. I faced the same issue with the exact same node and EKS versions while updating the EKS cluster version and node version. After checking logs I noticed the error "cni config uninitialized", which led me to the solution.

    To solve this you need to add the following networking add-ons in the AWS EKS management console:

    • CoreDNS
    • Kube-proxy
    • VPC CNI

    Ensure to set the "conflict resolution method" to "override" while installing the add-ons. If not set, the installation will fail.

    After installing these add-ons, my node and cluster updates went smoothly.

    Hope this helps you out! Cheers!

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search