i want to extend the airflow on docker with providers hdfs:
https://airflow.apache.org/docs/docker-stack/build.html#examples-of-image-extending
the Dockerfile looks like:
FROM apache/airflow:2.2.4
ARG DEV_APT_DEPS="
curl
gnupg2
apt-transport-https
apt-utils
build-essential
ca-certificates
gnupg
dirmngr
freetds-bin
freetds-dev
gosu
krb5-user
ldap-utils
libffi-dev
libkrb5-dev
libldap2-dev
libpq-dev
libsasl2-2
libsasl2-dev
libsasl2-modules
libssl-dev
locales
lsb-release
nodejs
openssh-client
postgresql-client
python-selinux
sasl2-bin
software-properties-common
sqlite3
sudo
unixodbc
unixodbc-dev
yarn "
USER root
RUN mv /etc/apt/sources.list /etc/apt/sources.list.bak
&& echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free' >> /etc/apt/sources.list
&& echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-updates main contrib non-free' >> /etc/apt/sources.list
&& echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-backports main contrib non-free' >> /etc/apt/sources.list
&& echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian-security buster/updates main contrib non-free' >> /etc/apt/sources.list
&& apt-get update
&& apt-get install -y --no-install-recommends
${DEV_APT_DEPS}
&& apt-get autoremove -yqq --purge
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
USER airflow
COPY --chown=airflow:root constraints-3.7.txt /opt/airflow/
COPY --chown=airflow:root ifxjdbc.jar /opt/airflow/jdbc-drivers/
RUN pip install --timeout=3600 --no-cache-dir --user
--constraint /opt/airflow/constraints-3.7.txt
--index-url https://pypi.tuna.tsinghua.edu.cn/simple
--trusted-host pypi.tuna.tsinghua.edu.cn
apache-airflow-providers-apache-hive
apache-airflow-providers-apache-hdfs # this line will make a error in the future
it’s build successful
but when i init it : docker-compose up airflow-init
there got an error
airflow-init_1 | ....................
airflow-init_1 | ERROR! Maximum number of retries (20) reached.
airflow-init_1 |
airflow-init_1 | Last check result:
airflow-init_1 | $ airflow db check
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/__main__.py", line 28, in <module>
airflow-init_1 | from airflow.cli import cli_parser
airflow-init_1 | File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 621, in <module>
airflow-init_1 | type=argparse.FileType('w', encoding='UTF-8'),
airflow-init_1 | TypeError: __init__() got an unexpected keyword argument 'encoding'
airflow-init_1 |
airflow-224_airflow-init_1 exited with code 1
if the dockerfile remove ‘apache-airflow-providers-apache-hdfs’ then rebuild
it can init ok…
HELP~ i really need the provider of hdfs
2
Answers
After a bit more looking I think I know where your problem is – You are likely not using the "entrypoint" to run airflow. You should always use the original entrypoint https://airflow.apache.org/docs/docker-stack/entrypoint.html and you are lkely overriding it without your own entrypoint which does not initialize airflow environment properly.
This gave me some trouble the issue is this: argparse is now in python core and must not be installed.
The package apache-airflow-providers-apache-hdfs requires snakebite-py3 and snakebite-py3 installs argparse.
To solve the problem just add: RUN pip uninstall -y argparse to the end of your Dockerfile