skip to Main Content

i want to extend the airflow on docker with providers hdfs:
https://airflow.apache.org/docs/docker-stack/build.html#examples-of-image-extending

the Dockerfile looks like:

FROM apache/airflow:2.2.4
ARG DEV_APT_DEPS="
     curl 
     gnupg2 
     apt-transport-https 
     apt-utils 
     build-essential 
     ca-certificates 
     gnupg 
     dirmngr 
     freetds-bin 
     freetds-dev 
     gosu 
     krb5-user 
     ldap-utils 
     libffi-dev 
     libkrb5-dev 
     libldap2-dev 
     libpq-dev 
     libsasl2-2 
     libsasl2-dev 
     libsasl2-modules 
     libssl-dev 
     locales  
     lsb-release 
     nodejs 
     openssh-client 
     postgresql-client 
     python-selinux 
     sasl2-bin 
     software-properties-common 
     sqlite3 
     sudo 
     unixodbc 
     unixodbc-dev 
     yarn "
     
USER root
RUN mv /etc/apt/sources.list /etc/apt/sources.list.bak 
  && echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free' >> /etc/apt/sources.list 
  && echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-updates main contrib non-free' >> /etc/apt/sources.list 
  && echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-backports main contrib non-free' >> /etc/apt/sources.list 
  && echo 'deb https://mirrors.tuna.tsinghua.edu.cn/debian-security buster/updates main contrib non-free' >> /etc/apt/sources.list 
  && apt-get update 
  && apt-get install -y --no-install-recommends 
    ${DEV_APT_DEPS} 
  && apt-get autoremove -yqq --purge 
  && apt-get clean 
  && rm -rf /var/lib/apt/lists/*

  
USER airflow
COPY --chown=airflow:root constraints-3.7.txt /opt/airflow/
COPY --chown=airflow:root ifxjdbc.jar /opt/airflow/jdbc-drivers/
RUN pip install --timeout=3600 --no-cache-dir --user 
  --constraint /opt/airflow/constraints-3.7.txt 
  --index-url https://pypi.tuna.tsinghua.edu.cn/simple 
  --trusted-host pypi.tuna.tsinghua.edu.cn 
  apache-airflow-providers-apache-hive 
  apache-airflow-providers-apache-hdfs # this line will make a error in the future

it’s build successful

but when i init it : docker-compose up airflow-init

there got an error

airflow-init_1       | ....................
airflow-init_1       | ERROR! Maximum number of retries (20) reached.
airflow-init_1       | 
airflow-init_1       | Last check result:
airflow-init_1       | $ airflow db check
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1       |     from airflow.__main__ import main
airflow-init_1       |   File "/home/airflow/.local/lib/python3.7/site-packages/airflow/__main__.py", line 28, in <module>
airflow-init_1       |     from airflow.cli import cli_parser
airflow-init_1       |   File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 621, in <module>
airflow-init_1       |     type=argparse.FileType('w', encoding='UTF-8'),
airflow-init_1       | TypeError: __init__() got an unexpected keyword argument 'encoding'
airflow-init_1       | 
airflow-224_airflow-init_1 exited with code 1

if the dockerfile remove ‘apache-airflow-providers-apache-hdfs’ then rebuild

it can init ok…

HELP~ i really need the provider of hdfs

2

Answers


  1. After a bit more looking I think I know where your problem is – You are likely not using the "entrypoint" to run airflow. You should always use the original entrypoint https://airflow.apache.org/docs/docker-stack/entrypoint.html and you are lkely overriding it without your own entrypoint which does not initialize airflow environment properly.

    Login or Signup to reply.
  2. This gave me some trouble the issue is this: argparse is now in python core and must not be installed.
    The package apache-airflow-providers-apache-hdfs requires snakebite-py3 and snakebite-py3 installs argparse.

    To solve the problem just add: RUN pip uninstall -y argparse to the end of your Dockerfile

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search