I have been struggling with an issue in my Apache Airflow DAG for some time now. I am trying to use the snowflake-connector-python module in my DAG, and it is included in my requirements.txt file. However, when I try to run the DAG, I keep getting the following error message:
`Broken DAG: [/usr/local/airflow/dags/my_dag.py] Traceback (most recent call last):`
`File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed`
`File "/usr/local/airflow/dags/my_dag.py", line 12, in <module>`
`from snowflake.connector.pandas_tools import pd_writer`
`ModuleNotFoundError: No module named 'snowflake'`
I have tried everything that I could think of, including:
Checking that the module is included in the requirements.txt file and that the file is correctly formatted
Restarting the Airflow scheduler and webserver
Clearing the Airflow DAG cache
Checking that the correct Python environment is being used
Installing the module using pip in the correct environment
Despite all of these efforts, the error message persists. Does anyone have any other suggestions for how to resolve this issue? I would greatly appreciate any help or advice. Thank you.
I’ve tried everything and cannot find a solution`
2
Answers
MWAA is a tricky beast, I hope this helps:
First check the WebServer log group in CloudWatch, there is a separate
requirements_install
log stream that will tell you exactly what is installed during service startup. You will find the errors there.This is not well documented, but if you set up a private MWAA WebServer, it won’t have access to internet even for downloading packages. The solution is to install packages from
requirements.txt
intoplugins.zip
and upload it to the bucket the MWAA uses.Also make sure to follow up the best practices for MWAA:
https://docs.aws.amazon.com/mwaa/latest/userguide/working-dags-dependencies.html
https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-dependencies.html#best-practices-dependencies-python-wheels-s3
Making changes to requirements.txt can be tricky, and this is why I always recommend to developers to first test out using mwaa-local-runner. Using this project, you can use the ./mwaa-local-env test-requirements to validate any changes locally, before you submit up to your MWAA environment.
If you are having issues, troubleshooting them locally is a lot easier and will normally (in my experience) move you to finding out what you need to change.
Typically when I need to make changes to the requirements.txt, this is what I do:
(in the above example, my "modified" constraints file that I copied to the S3 Dags folder is called updated-constraints.txt
You can see how I have done this with some code – https://github.com/094459/cdk-mwaa-redshift