I’m trying to setup a k8s cluster. I’ve already deployed an ingress controller and a cert manager. However, currently I’m trying to deploy a first small service (Spring Cloud Config Server) and noticed that my pods cannot access services that are running on other nodes.
The pod tries to resolve a dns name which is publicly available and fails in this attempt due to a timeout while reaching the coredns-service.
My Cluster looks like this:
Nodes:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master Ready master 6d17h v1.17.2 10.0.0.10 <none> CentOS Linux 7 (Core) 5.5.0-1.el7.elrepo.x86_64 docker://19.3.5
node-1 Ready <none> 6d17h v1.17.2 10.0.0.11 <none> CentOS Linux 7 (Core) 5.5.0-1.el7.elrepo.x86_64 docker://19.3.5
node-2 Ready <none> 6d17h v1.17.2 10.0.0.12 <none> CentOS Linux 7 (Core) 5.5.0-1.el7.elrepo.x86_64 docker://19.3.5
Pods:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-c6cb4cbdf-kcdhx 1/1 Running 1 23h 10.244.2.22 node-2 <none> <none>
cert-manager cert-manager-cainjector-76f7596c4-5f2h8 1/1 Running 3 23h 10.244.1.21 node-1 <none> <none>
cert-manager cert-manager-webhook-8575f88c85-b7vcx 1/1 Running 1 23h 10.244.2.23 node-2 <none> <none>
ingress-nginx ingress-nginx-5kghx 1/1 Running 1 6d16h 10.244.1.23 node-1 <none> <none>
ingress-nginx ingress-nginx-kvh5b 1/1 Running 1 6d16h 10.244.0.6 k8s-master <none> <none>
ingress-nginx ingress-nginx-rrq4r 1/1 Running 1 6d16h 10.244.2.21 node-2 <none> <none>
project1 config-server-7897679d5d-q2hmr 0/1 CrashLoopBackOff 1 103m 10.244.1.22 node-1 <none> <none>
project1 config-server-7897679d5d-vvn6s 1/1 Running 1 21h 10.244.2.24 node-2 <none> <none>
kube-system coredns-6955765f44-7ttww 1/1 Running 2 6d17h 10.244.2.20 node-2 <none> <none>
kube-system coredns-6955765f44-b57kq 1/1 Running 2 6d17h 10.244.2.19 node-2 <none> <none>
kube-system etcd-k8s-master 1/1 Running 5 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-apiserver-k8s-master 1/1 Running 5 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-controller-manager-k8s-master 1/1 Running 8 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-flannel-ds-amd64-f2lw8 1/1 Running 11 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-flannel-ds-amd64-kt6ts 1/1 Running 11 6d17h 10.0.0.11 node-1 <none> <none>
kube-system kube-flannel-ds-amd64-pb8r9 1/1 Running 12 6d17h 10.0.0.12 node-2 <none> <none>
kube-system kube-proxy-b64jt 1/1 Running 5 6d17h 10.0.0.12 node-2 <none> <none>
kube-system kube-proxy-bltzm 1/1 Running 5 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-proxy-fl9xb 1/1 Running 5 6d17h 10.0.0.11 node-1 <none> <none>
kube-system kube-scheduler-k8s-master 1/1 Running 7 6d17h 10.0.0.10 k8s-master <none> <none>
Services:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
cert-manager cert-manager ClusterIP 10.102.188.88 <none> 9402/TCP 23h app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager cert-manager-webhook ClusterIP 10.96.98.94 <none> 443/TCP 23h app.kubernetes.io/instance=cert-manager,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=webhook,app=webhook
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d17h <none>
ingress-nginx ingress-nginx NodePort 10.101.135.13 <none> 80:31080/TCP,443:31443/TCP 6d16h app.kubernetes.io/name=ingress-nginx,app.kubernetes.io/part-of=ingress-nginx
project1 config-server ClusterIP 10.99.94.55 <none> 80/TCP 24h app=config-server,release=config-server
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6d17h k8s-app=kube-dns
I’ve noticed that my newly deployed service has no access to the coredns service on node-1. My coredns service has two pods of which no one is running on node-1. If I understand it correctly it should be possible to access the coredns pods via the service ip (10.96.0.10) on every node whether or not it runs on it.
I’ve already noticed that the routing tables on the nodes look like this:
default via 172.31.1.1 dev eth0
10.0.0.0/16 via 10.0.0.1 dev eth1 proto static
10.0.0.1 dev eth1 scope link
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.31.1.1 dev eth0 scope link
So as you see there is no route to the 10.96.0.0/16 network.
I’ve already checked the ports and the net.bridge.bridge-nf-call-iptables
and net.bridge.bridge-nf-call-ip6tables
sysctl values. All flannel ports are reachable and should be able to receive traffic over the 10.0.0.0/24 network.
Here is the output of iptables -L
on the node-1:
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL all -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere tcp dpt:22
ACCEPT icmp -- anywhere anywhere
ACCEPT udp -- anywhere anywhere udp spt:ntp
ACCEPT tcp -- 10.0.0.0/24 anywhere
ACCEPT udp -- 10.0.0.0/24 anywhere
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
LOG all -- anywhere anywhere limit: avg 15/min burst 5 LOG level debug prefix "Dropped by firewall: "
DROP all -- anywhere anywhere
Chain FORWARD (policy DROP)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
DOCKER-USER all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- 10.244.0.0/16 anywhere
ACCEPT all -- anywhere 10.244.0.0/16
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
ACCEPT udp -- anywhere anywhere udp dpt:ntp
Chain DOCKER (1 references)
target prot opt source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target prot opt source destination
DROP all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- anywhere anywhere
Chain KUBE-EXTERNAL-SERVICES (1 references)
target prot opt source destination
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- 10.244.0.0/16 anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere 10.244.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-SERVICES (3 references)
target prot opt source destination
REJECT tcp -- anywhere 10.99.94.55 /* project1/config-server:http has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
The cluster is deployed via ansible.
I’m sure I’m doing anything wrong. However I couldn’t see it.
Can somebody help me here?
Thanks
3
Answers
I've followed the suggestion from Dawid Kruk and tried it with kubespray. Now it works as intended. If I'm able to figure out were my mistake was, I would post it here for the future.
Edit: Solution
My firewall rules were too restrictive. Flannel creates a new interfaces and since my rules are not restricted to my main interface nearly every package from flannel was dropped. If I had viewed the journalctl more attentive, I've found the issue earlier.
I am not sure what is the exact issue here. But I would like to clarify few things to make things more clear.
Cluster IPs are virtual IPs. They are not routed via routing tables. Instead, for each cluster IP, kube-proxy adds NAT table entries on its respective node. To check those entries, execute command
sudo iptables -t nat -L -n -v
.Now, core dns pods are exposed via a service cluster IP. Hence, whenever a packet comes to a node having destination address as cluster IP, its destination address is changed to the pod IP address which is routable from all the nodes (thanks to flannel). This change in destination address is done via a DNAT target entry in the iptables which looks like below.
Hence, if you can re-simulate the issue, try checking nat table entries to see if everything is proper.
I experienced the same issue on Kubernetes with the Calico network stack under Debian Buster.
Checking a lot of configs and parameters, I ended up with getting it to work by changing the policy for the forward rule to
ACCEPT
. This made it clear that the issue is somewhere around the firewall. Due to security considerations I changed it back.Running
iptables -L
gave me the following unveiling warning:# Warning: iptables-legacy tables present, use iptables-legacy to see them
The output given by the list command does not contain any Calico rules. Running
iptables-legacy -L
showed me the Calico rules, so it seems obvious now why it didn’t work. So Calico seems to use the legacy interface.The issue is the change in Debian to
iptables-nft
in the alternatives, you can check via:Doing the following:
Now it works all fine!
Thanks to Long at the Kubernetes Slack channel for pointing the route to solving it.