skip to Main Content

I’m experimenting an issue upgrading kube-proxy from 1.21 to 1.22.
Already update control-plane components (apiserver,scheduler and controller-manager) to 1.22 without any problem.
When I updated the first worker node (kubelet and kube-proxy), from 1.21 to 1.22, LoadBalancer Service on the node became unreachable, reverting to 1.21 fixed the problem.

I verified that ARP requests receive replies with the correct MAC address and I see correct traffic flow with tcpdump on the NIC of the node.

After a bit investigation on the worker node inside iptables rules I noticed that on 1.22 node I have this rule (nat table):

-A KUBE-XLB-GYH4OE6JZWRDML2Y -m comment --comment "swp-customer/swpc-25abfa45-ac5c-487f-81b9-178602c569f3:http has no local endpoints" -j KUBE-MARK-DROP

On the 1.21, instead, I have this rules:

-A KUBE-XLB-B67G6CBBIZ3WMS7Y -m comment --comment "Balancing rule 0 for swp-customer/swpc-2ad2a9e3-25cf-430e-893b-dbd4ec77b197:http" -j KUBE-SEP-3LIV6VCSPFRWVHFU
-A KUBE-SEP-3LIV6VCSPFRWVHFU -p tcp -m comment --comment "swp-customer/swpc-2ad2a9e3-25cf-430e-893b-dbd4ec77b197:http" -m tcp -j DNAT --to-destination 10.244.1.219:80

The second one, on 1.21 node, is the correct rule in order to nat traffic to the container.

I guess that kube-proxy 1.22 thinks that there are no local endpoints (reverting to kube-proxy 1.21 on the same node works fine) but I can’t figure out why. kube-proxy seems to start regularly and there is nothing strange in their log.

My environment:

  • k8s nodes: VM based on CentOS 7 with VNIC bridged to Physical NIC on hypervisor
  • Container runtime: docker://19.3.5
  • k8s cluster deployment mode: from scratch
  • k8s network plugin: flannel + metallb

Thanks a lot for any help

2

Answers


  1. Chosen as BEST ANSWER

    Delete and recreate service, with the same spec, solved the problem.

    I don't known why because i compared saved yaml file, before deletion and after recreation, and they have the same fields.


  2. The issue: https://github.com/kubernetes/kubernetes/issues/110208

    You can just restart/recreate one of the service backend to workaround.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search