Amazon web services - How to resolve EKS Fragate nodes disk pressure

anujprashar
July 12, 2022
236 views
0 votes
2 Answers

I am running EKS cluster with fargate profile. I checked nodes status by using kubectl describe node and it is showing disk pressure:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 12 Jul 2022 03:10:33 +0000   Wed, 29 Jun 2022 13:21:17 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Tue, 12 Jul 2022 03:10:33 +0000   Wed, 06 Jul 2022 19:46:54 +0000   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   Tue, 12 Jul 2022 03:10:33 +0000   Wed, 29 Jun 2022 13:21:17 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 12 Jul 2022 03:10:33 +0000   Wed, 29 Jun 2022 13:21:27 +0000   KubeletReady                 kubelet is posting ready status

And also there is failed garbage collection event.

Events:
  Type     Reason                Age                     From     Message
  ----     ------                ----                    ----     -------
  Warning  FreeDiskSpaceFailed   11m (x844 over 2d22h)   kubelet  failed to garbage collect required amount of images. Wanted to free 6314505830 bytes, but freed 0 bytes
  Warning  EvictionThresholdMet  65s (x45728 over 5d7h)  kubelet  Attempting to reclaim ephemeral-storage

I think cause of disk filling quickly is due to application logs, which application is writing to stdout, as per aws documentation which in turn is written to log files by container agent and I am using fargate in-built fluentbit to push application logs to opensearch cluster.

But looks like EKS cluster is not deleting old log files created by container agent.

I was looking to SSH into fargate nodes to furhter debug issue but as per aws support ssh into fargate nodes not possible.

What can be done to remove disk pressure from fargate nodes?

As suggested in answers I am using logrotate in sidecar. But as per logs of logrotate container it is not able to find dir:

rotating pattern: /var/log/containers/*.log
 52428800 bytes (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/containers/*.log
  log /var/log/containers/*.log does not exist -- skipping
reading config file /etc/logrotate.conf
Reading state from file: /var/lib/logrotate.status
Allocating hash table for state file, size 64 entries
Creating new state

yaml file is:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-apis
  namespace: kube-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: my-apis
          image: 111111xxxxx.dkr.ecr.us-west-2.amazonaws.com/my-apis:1.0.3
          ports:
            - containerPort: 8080
          resources:
            limits:
              cpu: "1000m"
              memory: "1200Mi"
            requests:
              cpu: "1000m"
              memory: "1200Mi"
          readinessProbe:
            httpGet:
              path: "/ping"
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 2
          livenessProbe:
            httpGet:
              path: "/ping"
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 5
        - name: logrotate
          image: realz/logrotate
          volumeMounts:
          - mountPath: /var/log/containers
            name: my-app-logs
          env:
          - name: CRON_EXPR
            value: "*/5 * * * *"
          - name: LOGROTATE_LOGFILES
            value: "/var/log/containers/*.log"
          - name: LOGROTATE_FILESIZE
            value: "50M"
          - name: LOGROTATE_FILENUM
            value: "5"
      volumes:
      - name: my-app-logs
        emptyDir: {}

Answers

Chosen as BEST ANSWER
- anujprashar
- July 19, 2022 at 1:52 pm
- 0 votes
0
Found the cause of disk filling quickly. It was due to logging library logback writing logs to both files and console and log rotation policy in logback was retaining large number of log files for long periods. Removing appender in logback config that is writing to files to fix issue.

Also I found out that STDOUT logs written to files by container agent are rotated and have files size of 10 mb and maximum of 5 files. So it cannot cause disk pressure.

(Edit)

- gohm39c
- July 12, 2022 at 9:04 am
- 0 votes
0
What can be done to remove disk pressure from fargate nodes?

No known configuration that could have Fargate to automatic clean a specific log location. You can run logrotate as sidecar. Plenty of choices here.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – How to resolve EKS Fragate nodes disk pressure

Answers