skip to Main Content

I am trying to setup remote logging in Airflow stable/airflow helm chart on v1.10.9 I am using Kubernetes executor and puckel/docker-airflow image. here’s my values.yaml file.

airflow:
  image:
     repository: airflow-docker-local
     tag: 1.10.9
  executor: Kubernetes
  service:
    type: LoadBalancer
  config:
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.9
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
    AIRFLOW__KUBERNETES__NAMESPACE: airflow
    AIRFLOW__CORE__REMOTE_LOGGING: True
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "s3://xxx"
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "s3://aws_access_key_id:aws_secret_access_key@bucket"
    AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
persistence:
  enabled: true
  existingClaim: ''
postgresql:
  enabled: true
workers:
  enabled: false
redis:
  enabled: false
flower:
  enabled: false

but my logs don’t get exported to S3, all I get on UI is

*** Log file does not exist: /usr/local/airflow/logs/icp_job_dag/icp-kube-job/2019-02-13T00:00:00+00:00/1.log
*** Fetching from: http://icpjobdagicpkubejob-f4144a374f7a4ac9b18c94f058bc7672:8793/log/icp_job_dag/icp-kube-job/2019-02-13T00:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='icpjobdagicpkubejob-f4144a374f7a4ac9b18c94f058bc7672', port=8793): Max retries exceeded with url: /log/icp_job_dag/icp-kube-job/2019-02-13T00:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f511c883710>: Failed to establish a new connection: [Errno -2] Name or service not known'))

any one have more insights what could I be missing?

Edit: from @trejas’s suggestion below. I created a separate connection and using that. here’s what my airflow config in values.yaml look like

airflow:
  image:
     repository: airflow-docker-local
     tag: 1.10.9
  executor: Kubernetes
  service:
    type: LoadBalancer
  connections:
  - id: my_aws
    type: aws
    extra: '{"aws_access_key_id": "xxxx", "aws_secret_access_key": "xxxx", "region_name":"us-west-2"}'
  config:
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.9
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
    AIRFLOW__KUBERNETES__NAMESPACE: airflow

    AIRFLOW__CORE__REMOTE_LOGGING: True
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://airflow.logs
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: my_aws
    AIRFLOW__CORE__ENCRYPT_S3_LOGS: False

I still have the same issue.

2

Answers


  1. Your remote log conn id needs to be an ID of a connection in the connections form/list. Not a connection string.

    https://airflow.apache.org/docs/stable/howto/write-logs.html

    https://airflow.apache.org/docs/stable/howto/connection/index.html

    Login or Signup to reply.
  2. I was running into the same issue and thought I’d follow up with what ended up working for me. The connection is correct but you need to make sure that the worker pods have the same environment variables:

    airflow:
      image:
         repository: airflow-docker-local
         tag: 1.10.9
      executor: Kubernetes
      service:
        type: LoadBalancer
      connections:
      - id: my_aws
        type: aws
        extra: '{"aws_access_key_id": "xxxx", "aws_secret_access_key": "xxxx", "region_name":"us-west-2"}'
      config:
        AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
        AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.9
        AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
        AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
        AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
        AIRFLOW__KUBERNETES__NAMESPACE: airflow
    
        AIRFLOW__CORE__REMOTE_LOGGING: True
        AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://airflow.logs
        AIRFLOW__CORE__REMOTE_LOG_CONN_ID: my_aws
        AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
        AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True
        AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOG_CONN_ID: my_aws
        AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://airflow.logs
        AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
    
    

    I also had to set the fernet key for the workers (and in general) otherwise I get an invalid token error:

    airflow:
      fernet_key: "abcdefghijkl1234567890zxcvbnmasdfghyrewsdsddfd="
    
      config:
        AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__FERNET_KEY: "abcdefghijkl1234567890zxcvbnmasdfghyrewsdsddfd="
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search