skip to Main Content

I have redis DB setup running on my minikube cluster. I have shutdown my minikube and started after 3 days and I can see my redis pod is failing to come up with below error from pod log

Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>.

Below is my Stateful Set yaml file for redis master deployed via a helm chart

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: test-redis
    meta.helm.sh/release-namespace: test
  generation: 1
  labels:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: test-redis
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis
    helm.sh/chart: redis-14.8.11
  name: test-redis-master
  namespace: test
  resourceVersion: "191902"
  uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: master
      app.kubernetes.io/instance: test-redis
      app.kubernetes.io/name: redis
  serviceName: test-redis-headless
  template:
    metadata:
      annotations:
        checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
        checksum/health: xxxxx
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: master
                  app.kubernetes.io/instance: test-redis
                  app.kubernetes.io/name: redis
              namespaces:
              - tyk
              topologyKey: kubernetes.io/hostname
            weight: 1
      containers:
      - args:
        - -c
        - /opt/bitnami/scripts/start-scripts/start-master.sh
        command:
        - /bin/bash
        env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: "no"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redis-password
              name: test-redis
        - name: REDIS_TLS_ENABLED
          value: "no"
        - name: REDIS_PORT
          value: "6379"
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_liveness_local.sh 5
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 6
        name: redis
        ports:
        - containerPort: 6379
          name: redis
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_readiness_local.sh 1
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 2
        resources: {}
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      serviceAccount: test-redis
      serviceAccountName: test-redis
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 493
          name: test-redis-scripts
        name: start-scripts
      - configMap:
          defaultMode: 493
          name: test-redis-health
        name: health
      - configMap:
          defaultMode: 420
          name: test-redis-configuration
        name: config
      - emptyDir: {}
        name: redis-tmp-conf
      - emptyDir: {}
        name: tmp
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/name: redis
      name: redis-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem
    status:
      phase: Pending

Please let me know your suggestions on how can I fix this.

2

Answers


    • I think your redis is not quit Gracefully , so the AOF file is in a bad format What is AOF

    • you should repair aof file using a initcontainer by command (./redis-check-aof –fix .)

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      annotations:
        meta.helm.sh/release-name: test-redis
        meta.helm.sh/release-namespace: test
      generation: 1
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
      name: test-redis-master
      namespace: test
      resourceVersion: "191902"
      uid: 3a4e541f-154f-4c54-a379-63974d90089e
    spec:
      podManagementPolicy: OrderedReady
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app.kubernetes.io/component: master
          app.kubernetes.io/instance: test-redis
          app.kubernetes.io/name: redis
      serviceName: test-redis-headless
      template:
        metadata:
          annotations:
            checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
            checksum/health: xxxxx
          creationTimestamp: null
          labels:
            app.kubernetes.io/component: master
            app.kubernetes.io/instance: test-redis
            app.kubernetes.io/managed-by: Helm
            app.kubernetes.io/name: redis
            helm.sh/chart: redis-14.8.11
        spec:
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - podAffinityTerm:
                  labelSelector:
                    matchLabels:
                      app.kubernetes.io/component: master
                      app.kubernetes.io/instance: test-redis
                      app.kubernetes.io/name: redis
                  namespaces:
                  - tyk
                  topologyKey: kubernetes.io/hostname
                weight: 1
          initContainers:
          - name: repair-redis
            image: docker.io/bitnami/redis:6.2.5-debian-10-r11
            command: ['sh', '-c', "redis-check-aof --fix  /data/appendonly.aof"]
          containers:
          - args:
            - -c
            - /opt/bitnami/scripts/start-scripts/start-master.sh
            command:
            - /bin/bash
            env:
            - name: BITNAMI_DEBUG
              value: "false"
            - name: REDIS_REPLICATION_MODE
              value: master
            - name: ALLOW_EMPTY_PASSWORD
              value: "no"
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: redis-password
                  name: test-redis
            - name: REDIS_TLS_ENABLED
              value: "no"
            - name: REDIS_PORT
              value: "6379"
            image: docker.io/bitnami/redis:6.2.5-debian-10-r11
            imagePullPolicy: IfNotPresent
            livenessProbe:
              exec:
                command:
                - sh
                - -c
                - /health/ping_liveness_local.sh 5
              failureThreshold: 5
              initialDelaySeconds: 20
              periodSeconds: 5
              successThreshold: 1
              timeoutSeconds: 6
            name: redis
            ports:
            - containerPort: 6379
              name: redis
              protocol: TCP
            readinessProbe:
              exec:
                command:
                - sh
                - -c
                - /health/ping_readiness_local.sh 1
              failureThreshold: 5
              initialDelaySeconds: 20
              periodSeconds: 5
              successThreshold: 1
              timeoutSeconds: 2
            resources: {}
            securityContext:
              runAsUser: 1001
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /opt/bitnami/scripts/start-scripts
              name: start-scripts
            - mountPath: /health
              name: health
            - mountPath: /data
              name: redis-data
            - mountPath: /opt/bitnami/redis/mounted-etc
              name: config
            - mountPath: /opt/bitnami/redis/etc/
              name: redis-tmp-conf
            - mountPath: /tmp
              name: tmp
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext:
            fsGroup: 1001
          serviceAccount: test-redis
          serviceAccountName: test-redis
          terminationGracePeriodSeconds: 30
          volumes:
          - configMap:
              defaultMode: 493
              name: test-redis-scripts
            name: start-scripts
          - configMap:
              defaultMode: 493
              name: test-redis-health
            name: health
          - configMap:
              defaultMode: 420
              name: test-redis-configuration
            name: config
          - emptyDir: {}
            name: redis-tmp-conf
          - emptyDir: {}
            name: tmp
      updateStrategy:
        rollingUpdate:
          partition: 0
        type: RollingUpdate
      volumeClaimTemplates:
      - apiVersion: v1
        kind: PersistentVolumeClaim
        metadata:
          creationTimestamp: null
          labels:
            app.kubernetes.io/component: master
            app.kubernetes.io/instance: test-redis
            app.kubernetes.io/name: redis
          name: redis-data
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 8Gi
          volumeMode: Filesystem
    
    
    Login or Signup to reply.
  1. I am not an Redis expert but from what I can see:

    kubectl describe pod red3-redis-master-0
    ...
    Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
    ...
    

    Means that your appendonly.aof file was corrupted with invalid byte sequences in the middle.

    How we can proceed if redis-master is not working?:

    • Verify pvc attached to the redis-master-pod:
    kubectl get pvc
    
    NAME                               STATUS   VOLUME                                    
    redis-data-red3-redis-master-0     Bound    pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359  
    
    • Create new redis-client pod wit the same pvc redis-data-red3-redis-master-0:
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: redis-client
    spec:
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: redis-data-red3-redis-master-0
      containers:
        - name: redis
          image: docker.io/bitnami/redis:6.2.3-debian-10-r0
          command: ["/bin/bash"]
          args: ["-c", "sleep infinity"]
          volumeMounts:
            - mountPath: "/tmp"
              name: data
    EOF
    
    • Backup your files:
    kubectl cp redis-client:/tmp .
    
    • Repair appendonly.aof file:
    kubectl exec -it redis-client -- /bin/bash
    
    cd /tmp
    
    # make copy of appendonly.aof file:
    cp appendonly.aof appendonly.aofbackup
    
    # verify appendonly.aof file:
    redis-check-aof appendonly.aof
    
    ...
    0x              38: Expected prefix '*', got: '"'
    AOF analyzed: size=62, ok_up_to=56, ok_up_to_line=13, diff=6
    AOF is not valid. Use the --fix option to try fixing it.
    ...
    
    # repair appendonly.aof file:
    redis-check-aof --fix appendonly.aof
    
    # compare files using diff:
    diff appendonly.aof appendonly.aofbackup
    

    Note:

    As per docs:

    The best thing to do is to run the redis-check-aof utility, initially without the –fix option, then understand the problem, jump at the given offset in the file, and see if it is possible to manually repair the file: the AOF uses the same format of the Redis protocol and is quite simple to fix manually. Otherwise it is possible to let the utility fix the file for us, but in that case all the AOF portion from the invalid part to the end of the file may be discarded, leading to a massive amount of data loss if the corruption happened to be in the initial part of the file.

    In addition as described in the comments by @Miffa Young you can verify where your data is stored using k8s.io/minikube-hostpath provisioner:

    kubectl get pv 
    ...
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      
    pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359   8Gi        RWO            Delete           Bound    default/redis-data-red3-redis-master-0     
    ...
    
    kubectl describe pv pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359
    ...
    Source:
        Type:          HostPath (bare host directory volume)
        Path:          /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
    ...
    

    Your redis instance is failing down because your appendonly.aof is malformed and stored permanently under this location.

    You can ssh into your vm:

    minikube -p redis ssh 
    cd /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
    # from there you can backup/repair/remove your files:
    

    Another solution is to install this chart using new name in this case new set of pv,pvc for redis StatefulSets will be created.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search