skip to Main Content

I have created a DAG to upload a local file into a personal S3 Bucket. However, when accessing http://localhost:9099/home I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: ‘C:UsersplataOneDriveΥπολογιστήςprojects backupsairflow-sqlserverdagspricedata.xlsx’
Ariflow error – broken dag

I have a Windows PC and I am running airflow on a docker container.

Here is the DAG’s code:

# airflow related
from airflow import DAG
from airflow.operators.python import PythonOperator
# other packages
from datetime import datetime
import boto3

with DAG(
    dag_id='file_to_s3',
    start_date=datetime(2022, 12, 5),
    catchup=False,
) as dag:
    pass


def file_to_s3():
    #Creating Session With Boto3.
    session = boto3.Session(
    aws_access_key_id='my_access_key_id',
    aws_secret_access_key='my_secret_access_key'
    )

    #Creating S3 Resource From the Session.
    s3 = session.resource('s3')

    result = s3.Bucket('flight-data-test-bucket').upload_file(r'C:UsersplataOneDriveΥπολογιστήςprojects backupsairflow-sqlserverdagspricedata.xlsx', 'pricedata.xlsx')

    return (result)


with DAG(
    dag_id='file_to_s3',
    start_date=datetime(2022, 12, 5),
    catchup=False
) as dag:
    # Upload the file
    task_file_to_s3 = PythonOperator(
        task_id='file_to_s3',
        python_callable=file_to_s3()
    )

I can’t understand why that happens since I have already stored my local file into my "dags" folder:
pricedata.xlsx location

And my "dags" folder is already mounted in the docker-compose.yml file which can be seen below:


  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    # For backward compatibility, with Airflow <2.3
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
    - ./data:/opt/airflow/data
  user: "${AIRFLOW_UID:-50000}:0"

Any ideas? Could this problem caused by the fact I am running Airflow on Windows through Docker?

2

Answers


  1. Chosen as BEST ANSWER

    I was able to fix it after changing the path to: result = s3.Bucket('flight-data-test-bucket').upload_file('/opt/airflow/dags/pricedata.xlsx', 'pricedata.xlsx')

    I had to also fix the python_callable=file_to_s3() to python_callable=file_to_s3


  2. The file system of your docker containers are not shared with windows by default.

    You can mount a drive so that you can persist files and share them between your windows and your docker:

    https://www.docker.com/blog/file-sharing-with-docker-desktop/

    note that in your docker, you will need the file path seen "in your docker container"

    with your docker compose, it looks like your xslx files is mounted here:
    ./dags:/opt/airflow/dags

    So I assume, that in your dag code, you could try:

    result = s3.Bucket('flight-data-test-bucket').upload_file(r'opt/airflow/dags/pricedata.xlsx', 'pricedata.xlsx')

    It might be a good idea to mount an additional drive with your project data outside of the DAG folder.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search