skip to Main Content

Today is 8th of September 2023. I run Airflow in Containers. Worker, Scheduler, etc. are on.
I have the following DAG:

from airflow import DAG
from datetime import datetime
from airflow.providers.postgres.operators.postgres import PostgresOperator

with DAG('user_processing',
     start_date=datetime(2023,10,12),
     schedule_interval='@daily',
     catchup=False) as dag:

     create_table = PostgresOperator(
        task_id='create_table',
        postgres_conn_id='postgres',
        sql='''
        CREATE TABLE IF NOT EXISTS users (
            firstname TEXT NOT NULL,
            lastname TEXT NOT NULL,
            country TEXT NOT NULL,
            username TEXT NOT NULL,
            password TEXT NOT NULL,
            email TEXT NOT NULL
        );
        '''
    )

If I add start_date in the future and trigger a manual run, the task is marked as success, but is not run (takes 00:00:00 seconds to run and I made a mistake in SQL on purpose to check if it will error out).
enter image description here
If I add start_date in the past or present (e.g: start_date=datetime(2022, 1, 1)), the task is marked as failed. Logs are empty. I am sure, there is no mistake in the task since I can run it from the Scheduler-1 container without any problems.
enter image description here

I am very confused what can be wrong. Please help

2

Answers


  1. Chosen as BEST ANSWER

    So, in the end the problem returned and I realised that I had permissions issues. In the beginning, when I tried to create DAGs in the /dags folder, I received an error with missing permissions. When running ls -la command, it looked like this (ignore obfuscated part, it is really not relevant): enter image description here

    So, to create DAG files, I ran this command: sudo chown -R betelgeitze ~/data_eng/airflow/dags Now the ls -la looks like this: enter image description here Now, depending on when you started Docker-compose and when you changed permissions, the DAGs might not run. In my case, sometimes it worked and sometimes not, I could not figure out the exact steps after which it stops working. However, if you encountered the problem that I described in this question, all you need to do is to change the permissions back and it will work. In my case it is: sudo chown -R 50000 ~/data_eng/airflow/dags (don't forget to add the name of your airflow user instead of 50000 and change the path to where your folder is: sudo chown -R <airflow username> <path>)


  2. For DAGs that are running daily I normally define the start_date argument as follows:

    from airflow.utils.dates import days_ago
    
    default_args = {
       ...,
       'start_date': day_ago(1),
       ...
    }
    

    The DAG will run once when you deploy it and past executions will not be queued. Note that if the periodicity of your DAG is not daily you will need to adjust the number of days ago.

    In addition, if you want to run the DAG on demand I would recommend you to set the schedule_interval to None instead of setting a future date in start_date, it makes no sense.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search