I have a multi-container pod running on AWS EKS. One web app container running on port 80 and a Redis container running on port 6379.
Once the deployment goes through, manual curl probes on the pod’s IP address:port from within the cluster are always good responses.
The ingress to service is fine as well.
However, the kubelet’s probes are failing, leading to restarts and I’m not sure how to replicate that probe fail nor fix it yet.
Thanks for reading!
Here are the events:
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Normal Killing pod/app-7cddfb865b-gsvbg Container app failed liveness probe, will be restarted
0s Normal Pulling pod/app-7cddfb865b-gsvbg Pulling image "registry/app:latest"
0s Normal Pulled pod/app-7cddfb865b-gsvbg Successfully pulled image "registry/app:latest"
0s Normal Created pod/app-7cddfb865b-gsvbg Created container app
Making things generic, this is my deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "16"
creationTimestamp: "2021-05-26T22:01:19Z"
generation: 19
labels:
app: app
chart: app-1.0.0
environment: production
heritage: Helm
owner: acme
release: app
name: app
namespace: default
resourceVersion: "234691173"
selfLink: /apis/apps/v1/namespaces/default/deployments/app
uid: 3149acc2-031e-4719-89e6-abafb0bcdc3c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: app
release: app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 100%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2021-09-17T09:04:49-07:00"
creationTimestamp: null
labels:
app: app
environment: production
owner: acme
release: app
spec:
containers:
- image: redis:5.0.6-alpine
imagePullPolicy: IfNotPresent
name: redis
ports:
- containerPort: 6379
hostPort: 6379
name: redis
protocol: TCP
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- env:
- name: SYSTEM_ENVIRONMENT
value: production
envFrom:
- configMapRef:
name: app-production
- secretRef:
name: app-production
image: registry/app:latest
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 1
name: app
ports:
- containerPort: 80
hostPort: 80
name: app
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 500Mi
requests:
cpu: "1"
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
priorityClassName: critical-app
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-08-10T17:34:18Z"
lastUpdateTime: "2021-08-10T17:34:18Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-05-26T22:01:19Z"
lastUpdateTime: "2021-09-17T16:48:54Z"
message: ReplicaSet "app-7f7cb8fd4" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 19
readyReplicas: 1
replicas: 1
updatedReplicas: 1
This is my service yaml:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2021-05-05T20:11:33Z"
labels:
app: app
chart: app-1.0.0
environment: production
heritage: Helm
owner: acme
release: app
name: app
namespace: default
resourceVersion: "163989104"
selfLink: /api/v1/namespaces/default/services/app
uid: 1f54cd2f-b978-485e-a1af-984ffeeb7db0
spec:
clusterIP: 172.20.184.161
externalTrafficPolicy: Cluster
ports:
- name: http
nodePort: 32648
port: 80
protocol: TCP
targetPort: 80
selector:
app: app
release: app
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
Update 10/20/2021:
So I went with the advice to tinker the readiness probe with these generous settings:
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
These are the events:
5m21s Normal Scheduled pod/app-686494b58b-6cjsq Successfully assigned default/app-686494b58b-6cjsq to ip-10-10-14-127.compute.internal
5m20s Normal Created pod/app-686494b58b-6cjsq Created container redis
5m20s Normal Started pod/app-686494b58b-6cjsq Started container redis
5m20s Normal Pulling pod/app-686494b58b-6cjsq Pulling image "registry/app:latest"
5m20s Normal Pulled pod/app-686494b58b-6cjsq Successfully pulled image "registry/app:latest"
5m20s Normal Created pod/app-686494b58b-6cjsq Created container app
5m20s Normal Pulled pod/app-686494b58b-6cjsq Container image "redis:5.0.6-alpine" already present on machine
5m19s Normal Started pod/app-686494b58b-6cjsq Started container app
0s Warning Unhealthy pod/app-686494b58b-6cjsq Readiness probe failed: Get http://10.10.14.117:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I see the readiness probe kicking into action though when I actually request the health check page (root page) manually, which is odd. But be that is it may, the probe failure is not for the containers not running fine — they are — but somewhere else.
2
Answers
Let’s go over your probes so you can understand what is going and might find a way to fix it:
View the logs/events
Linking my answer for
liveness and readiness probe for multiple containers in a pod