I am running EKS cluster with fargate profile. I checked nodes status by using kubectl describe node
and it is showing disk pressure:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 06 Jul 2022 19:46:54 +0000 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:27 +0000 KubeletReady kubelet is posting ready status
And also there is failed garbage collection event.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 11m (x844 over 2d22h) kubelet failed to garbage collect required amount of images. Wanted to free 6314505830 bytes, but freed 0 bytes
Warning EvictionThresholdMet 65s (x45728 over 5d7h) kubelet Attempting to reclaim ephemeral-storage
I think cause of disk filling quickly is due to application logs, which application is writing to stdout, as per aws documentation which in turn is written to log files by container agent and I am using fargate in-built fluentbit to push application logs to opensearch cluster.
But looks like EKS cluster is not deleting old log files created by container agent.
I was looking to SSH into fargate nodes to furhter debug issue but as per aws support ssh into fargate nodes not possible.
What can be done to remove disk pressure from fargate nodes?
As suggested in answers I am using logrotate in sidecar. But as per logs of logrotate container it is not able to find dir:
rotating pattern: /var/log/containers/*.log
52428800 bytes (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/containers/*.log
log /var/log/containers/*.log does not exist -- skipping
reading config file /etc/logrotate.conf
Reading state from file: /var/lib/logrotate.status
Allocating hash table for state file, size 64 entries
Creating new state
yaml file is:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-apis
namespace: kube-system
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: my-apis
image: 111111xxxxx.dkr.ecr.us-west-2.amazonaws.com/my-apis:1.0.3
ports:
- containerPort: 8080
resources:
limits:
cpu: "1000m"
memory: "1200Mi"
requests:
cpu: "1000m"
memory: "1200Mi"
readinessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
livenessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
- name: logrotate
image: realz/logrotate
volumeMounts:
- mountPath: /var/log/containers
name: my-app-logs
env:
- name: CRON_EXPR
value: "*/5 * * * *"
- name: LOGROTATE_LOGFILES
value: "/var/log/containers/*.log"
- name: LOGROTATE_FILESIZE
value: "50M"
- name: LOGROTATE_FILENUM
value: "5"
volumes:
- name: my-app-logs
emptyDir: {}
2
Answers
Found the cause of disk filling quickly. It was due to logging library
logback
writing logs to both files and console and log rotation policy in logback was retaining large number of log files for long periods. Removing appender in logback config that is writing to files to fix issue.Also I found out that
STDOUT
logs written to files bycontainer agent
are rotated and have files size of 10 mb and maximum of 5 files. So it cannot cause disk pressure.What can be done to remove disk pressure from fargate nodes?
No known configuration that could have Fargate to automatic clean a specific log location. You can run logrotate as sidecar. Plenty of choices here.