I just recently installed airflow 2.1.4 with docker containers, I’ve successfully set up the postgres, redis, scheduler, 2x local workers, and flower on the same machine with docker-compose.
Now I want to expand, and set up workers on other machines.
I was able to get the workers up and running, flower is able to find the worker node, the worker is receiving tasks from the scheduler correctly, but regardless of the result status of the task, the task would be marked as failed with error message like below:
*** Log file does not exist: /opt/airflow/logs/test/test/2021-10-29T14:38:37.669734+00:00/1.log
*** Fetching from: http://b7a0154e7e20:8793/log/test/test/2021-10-29T14:38:37.669734+00:00/1.log
*** Failed to fetch log file from worker. [Errno -3] Temporary failure in name resolution
Then I tried replaced AIRFLOW__CORE__HOSTNAME_CALLABLE: 'socket.getfqdn'
with AIRFLOW__CORE__HOSTNAME_CALLABLE: 'airflow.utils.net.get_host_ip_address'
I got this error instead:
*** Log file does not exist: /opt/airflow/logs/test/test/2021-10-28T15:47:59.625675+00:00/1.log
*** Fetching from: http://172.18.0.2:8793/log/test/test/2021-10-28T15:47:59.625675+00:00/1.log
*** Failed to fetch log file from worker. [Errno 113] No route to host
Then I tried map the port 8793 of the worker with its host machine (in worker_4 below), now it’s returning:
*** Failed to fetch log file from worker. [Errno 111] Connection refused
but sometimes still give "Temporary failure in name resolution" error.
I’ve also tried to copy the URL in the error, and change replace the IP with the host machine ip, and got this message:
Forbidden
You don't have the permission to access the requested resource. It is either read-protected or not readable by the server.
Please let me know if additional info is needed.
Thanks in advance!
Below is my docker-compose.yml for the scheduler/webserver/flower:
version: '3.4'
x-hosts: &extra_hosts
postgres: XX.X.XX.XXX
redis: XX.X.XX.XXX
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.4}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__DEFAULT_TIMEZONE: 'America/New_York'
AIRFLOW__CORE__HOSTNAME_CALLABLE: 'airflow.utils.net.get_host_ip_address'
AIRFLOW_WEBSERVER_DEFAULT_UI_TIMEZONE: 'America/New_York'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- apache-airflow-providers-slack}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
- /etc/hostname:/etc/hostname
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
extra_hosts: *extra_hosts
services:
postgres:
container_name: 'airflow-postgres'
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- ./data/postgres:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
ports:
- '5432:5432'
redis:
image: redis:latest
container_name: 'airflow-redis'
expose:
- 6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
ports:
- '6379:6379'
airflow-webserver:
<<: *airflow-common
container_name: 'airflow-webserver'
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
- redis
- postgres
airflow-scheduler:
<<: *airflow-common
container_name: 'airflow-scheduler'
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
- redis
- postgres
airflow-worker1:
build: ./worker_config
container_name: 'airflow-worker_1'
command: celery worker -H worker_1
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
- redis
- postgres
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
extra_hosts: *extra_hosts
airflow-worker2:
build: ./worker_config
container_name: 'airflow-worker_2'
command: celery worker -H worker_2
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
- redis
- postgres
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
extra_hosts: *extra_hosts
flower:
<<: *airflow-common
container_name: 'airflow_flower'
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
- redis
- postgres
and my docker-compose.yml for worker on another machine:
version: '3.4'
x-hosts: &extra_hosts
postgres: XX.X.XX.XXX
redis: XX.X.XX.XXX
x-airflow-common:
&airflow-common
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__DEFAULT_TIMEZONE: 'America/New_York'
AIRFLOW__CORE__HOSTNAME_CALLABLE: 'airflow.utils.net.get_host_ip_address'
AIRFLOW_WEBSERVER_DEFAULT_UI_TIMEZONE: 'America/New_York'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./assets:/opt/airflow/assets
- ./airflow.cfg:/opt/airflow/airflow.cfg
- /etc/hostname:/etc/hostname
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
extra_hosts: *extra_hosts
services:
worker_3:
build: ./worker_config
restart: always
extra_hosts: *extra_hosts
volumes:
- ./airflow.cfg:/opt/airflow/airflow.cfg
- ./dags:/opt/airflow/dags
- ./assets:/opt/airflow/assets
- ./logs:/opt/airflow/logs
- /etc/hostname:/etc/hostname
entrypoint: airflow celery worker -H worker_3
environment:
<<: *airflow-common-env
WORKER_NAME: worker_147
healthcheck:
test: ['CMD-SHELL', '[ -f /usr/local/airflow/airflow-worker.pid ]']
interval: 30s
timeout: 30s
retries: 3
worker_4:
build: ./worker_config_py2
restart: always
extra_hosts: *extra_hosts
volumes:
- ./airflow.cfg:/opt/airflow/airflow.cfg
- ./dags:/opt/airflow/dags
- ./assets:/opt/airflow/assets
- ./logs:/opt/airflow/logs
- /etc/hostname:/etc/hostname
entrypoint: airflow celery worker -H worker_4_py2 -q py2
environment:
<<: *airflow-common-env
WORKER_NAME: worker_4_py2
healthcheck:
test: ['CMD-SHELL', '[ -f /usr/local/airflow/airflow-worker.pid ]']
interval: 30s
timeout: 30s
retries: 3
ports:
- 8793:8793
2
Answers
For this issue: " Failed to fetch log file from worker. [Errno -3] Temporary failure in name resolution"
Looks like the worker’s hostname is not being correctly resolved. The web program of the master needs to go to the worker to fetch the log and display it on the front-end page. This process is to find the host name of the worker. Obviously, the host name cannot be found, Therefore, add the host name to IP mapping on the master’s vim /etc/hosts
2.If using LocalExecutor, the scheduler and the webserver must be on the same host.
Docker file:
here is for deps for docker to deploy for webserver
webserver:
to fix it:
Fist of all, get configuration file by typing:
After that check that fixPermissions is true.
You need to enable persistence volumes:
Enable persistent volumes
enabled: true
Volume size for worker StatefulSet
size: 10Gi
If using a custom storageClass, pass name ref to all statefulSets here
storageClassName:
Execute init container to chown log directory.
fixPermissions: true
Update your installation by:
Your scheduler does not expose the
8793
port. Try to expose it indocker-compose.yml
file.