skip to Main Content

I am deploying Airflow with official Helm chart and trying to understand why it requires stateful set for worker deployment. When it makes perfect sense for redis and postgtre I am not sure why this is requirement for worker

2

Answers


  1. Actually, in the official Helm chart, they choose between statefulset and deployment to deploy your workers, based on your persistence configurations:

    • if it’s enabled (by default), they use the StatefulSet in order to create the PVC which will create a PV per pod.
    • and if it’s disabled, they use the deployment

    Here is the link to the condition they use to choose between the two ressources.

    Login or Signup to reply.
  2. At least originally, the workers were deployed as a StatefulSet because the logs for the jobs were stored on the persistent volume associated with the id. When the webserver requests the logs from the worker, it needed to reference the defacto id in the set, e.g. celery-0, celery-1. If the webserver queried a random worker, it would return no logs, and was a common "bug/problem" people experience during deployment.

    This may or may not be true today, but it is clearly explained here: https://artifacthub.io/packages/helm/airflow-helm/airflow/7.15.0#docs-kubernetes—worker-autoscaling

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search