Readiness Probe for Redis with large dataset

Marc
September 25, 2020
271 views
1 vote
2 Answers

Issue

I have a Redis K8s deployment that links to a separate service, with a heavily reduced manifest as follows (if more info is needed that’s missing let me know):

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: cache
      environment: dev
  template:
    metadata:
      labels:
        app: cache
        environment: dev
    spec:
      containers:
        - name: cache
          image: marketplace.gcr.io/google/redis5
          imagePullPolicy: IfNotPresent
          livenessProbe:
            exec:
              command:
              - redis-cli
              - ping
            initialDelaySeconds: 30
            timeoutSeconds: 5
          readinessProbe:
            exec:
              command:
              - redis-cli
              - ping
            initialDelaySeconds: 30
            timeoutSeconds: 5
      volumes:
        - name: data
          nfs:
            server: "nfs-server.recs-api.svc.cluster.local"
            path: "/data"

I want to regularly redeploy Redis with a new dataset, instead of updating the existing cache.
When doing a kubectl rollout restart deployment/cache, old Redis pods are Terminated before new Redis pods are ready to accept traffic. These new Redis pods are marked READY, and as expected the old ones are Terminated, however redis-cli ping on new Redis pods returns (error) LOADING Redis is loading the dataset in memory.
It currently takes 5-10 minutes for Redis to stop loading the dataset and be ready to accept connections, but by this point they’ve been READY for the same amount of time, with active traffic directed to them as old pods have been Terminated.

My suspicion is that because the status code for this response is 0, and so the readinessProbe triggers READY 1/1 and kills the old pods, however I have not been able to find a suitable exec: command: that avoids this issue.

redis-cli info has a loading:0|1 line, and so I tested:

readinessProbe:
  exec:
    command: ["redis-cli", "info", "|", "grep loading:", "|", "grep 0"]

in the hope that for non 0 loading values, grep would provide a non-zero status code and fail the readinessProbe, but this didn’t seem to work and had the same behavior as redis-cli ping with the prematurely terminating pods and loss of service until loading had completed.

What I want

When deploying new Redis cache pods, I want there to be a pod ready to accept connections throughout, while the new Redis cache pods are loading dataset to memory
- Ideally in the form of a tidy readinessProbe check, but fully open to any suggestions!
- It’s also possible I’ve misunderstood the purpose of a readinessProbe so please let me know
If possible, better understand why redis-cli ping or other readinessProbes were still triggering a READY state for the new pods, despite non-zero status codes on exec: command:

Thanks!

Tags: kubernetes redis

Answers

I have investigated bitnami/redis charts and find out how do they implement liveness/readiness probe.

From their charts, they create a health-configmap, which contains a shell script using redis-cli ping to health check redis server, and handle responses.

Here is the configmap defined:

data:
  ping_readiness_local.sh: |-
    #!/bin/bash
{{- if .Values.usePasswordFile }}
    password_aux=`cat ${REDIS_PASSWORD_FILE}`
    export REDIS_PASSWORD=$password_aux
{{- end }}
{{- if .Values.usePassword }}
    no_auth_warning=$([[ "$(redis-cli --version)" =~ (redis-cli 5.*) ]] && echo --no-auth-warning)
{{- end }}
    response=$(
      timeout -s 3 $1 
      redis-cli 
{{- if .Values.usePassword }}
        -a $REDIS_PASSWORD $no_auth_warning 
{{- end }}
        -h localhost 
{{- if .Values.tls.enabled }}
        -p $REDIS_TLS_PORT 
        --tls 
        --cacert {{ template "redis.tlsCACert" . }} 
        {{- if .Values.tls.authClients }}
          --cert {{ template "redis.tlsCert" . }} 
          --key {{ template "redis.tlsCertKey" . }} 
        {{- end }}
{{- else }}
        -p $REDIS_PORT 
{{- end }}
        ping
    )
    if [ "$response" != "PONG" ]; then
      echo "$response"
      exit 1
    fi

And in deployment/statefulset, just set the probe to execute this shell script:

readinessProbe:
    initialDelaySeconds: {{ .Values.redis.readinessProbe.initialDelaySeconds }}
    periodSeconds: {{ .Values.redis.readinessProbe.periodSeconds }}
    timeoutSeconds: {{ .Values.redis.readinessProbe.timeoutSeconds }}
    successThreshold: {{ .Values.redis.readinessProbe.successThreshold }}
    failureThreshold: {{ .Values.redis.readinessProbe.failureThreshold }}
    exec:
      command:
        - sh
        - -c
        - /scripts/ping_readiness_local.sh {{ .Values.redis.readinessProbe.timeoutSeconds }}

The following should work just fine

The key is

tcpSocket:
        port: client # named port

The whole snippet

       - name: redis
         image: ${DOCKER_PATH_AND_IMAGE}
         resources:
           limits:
             memory: "1.5Gi"
           requests:
             memory: "1.5Gi"
         ports:
         - name: client
           containerPort: 6379
         - name: gossip
           containerPort: 16379
         command: ["/conf/update-node.sh", "redis-server", "/conf/redis.conf"]
         livenessProbe:
          tcpSocket:
            port: client # named port
          initialDelaySeconds: 30
          timeoutSeconds: 5
          periodSeconds: 5
          failureThreshold: 5
          successThreshold: 1
         readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 20
          timeoutSeconds: 5
          periodSeconds: 3

Please signup or login to give your own answer.

Click here to cancel reply.