I’m experimenting an issue upgrading kube-proxy from 1.21 to 1.22.
Already update control-plane components (apiserver,scheduler and controller-manager) to 1.22 without any problem.
When I updated the first worker node (kubelet and kube-proxy), from 1.21 to 1.22, LoadBalancer Service on the node became unreachable, reverting to 1.21 fixed the problem.
I verified that ARP requests receive replies with the correct MAC address and I see correct traffic flow with tcpdump on the NIC of the node.
After a bit investigation on the worker node inside iptables rules I noticed that on 1.22 node I have this rule (nat table):
-A KUBE-XLB-GYH4OE6JZWRDML2Y -m comment --comment "swp-customer/swpc-25abfa45-ac5c-487f-81b9-178602c569f3:http has no local endpoints" -j KUBE-MARK-DROP
On the 1.21, instead, I have this rules:
-A KUBE-XLB-B67G6CBBIZ3WMS7Y -m comment --comment "Balancing rule 0 for swp-customer/swpc-2ad2a9e3-25cf-430e-893b-dbd4ec77b197:http" -j KUBE-SEP-3LIV6VCSPFRWVHFU
-A KUBE-SEP-3LIV6VCSPFRWVHFU -p tcp -m comment --comment "swp-customer/swpc-2ad2a9e3-25cf-430e-893b-dbd4ec77b197:http" -m tcp -j DNAT --to-destination
The second one, on 1.21 node, is the correct rule in order to nat traffic to the container.
I guess that kube-proxy 1.22 thinks that there are no local endpoints (reverting to kube-proxy 1.21 on the same node works fine) but I can’t figure out why. kube-proxy seems to start regularly and there is nothing strange in their log.
My environment:
- k8s nodes: VM based on CentOS 7 with VNIC bridged to Physical NIC on hypervisor
- Container runtime: docker://19.3.5
- k8s cluster deployment mode: from scratch
- k8s network plugin: flannel + metallb
Thanks a lot for any help
Delete and recreate service, with the same spec, solved the problem.
I don't known why because i compared saved yaml file, before deletion and after recreation, and they have the same fields.
The issue: https://github.com/kubernetes/kubernetes/issues/110208
You can just restart/recreate one of the service backend to workaround.