I am deploying Airflow with official Helm chart and trying to understand why it requires stateful set for worker deployment. When it makes perfect sense for redis and postgtre I am not sure why this is requirement for worker
I am deploying Airflow with official Helm chart and trying to understand why it requires stateful set for worker deployment. When it makes perfect sense for redis and postgtre I am not sure why this is requirement for worker
2
Answers
Actually, in the official Helm chart, they choose between
statefulset
anddeployment
to deploy your workers, based on your persistence configurations:StatefulSet
in order to create the PVC which will create a PV per pod.deployment
Here is the link to the condition they use to choose between the two ressources.
At least originally, the workers were deployed as a StatefulSet because the logs for the jobs were stored on the persistent volume associated with the id. When the webserver requests the logs from the worker, it needed to reference the defacto id in the set, e.g. celery-0, celery-1. If the webserver queried a random worker, it would return no logs, and was a common "bug/problem" people experience during deployment.
This may or may not be true today, but it is clearly explained here: https://artifacthub.io/packages/helm/airflow-helm/airflow/7.15.0#docs-kubernetes—worker-autoscaling