skip to Main Content

I have airflow dags running on Google Cloud Composer that trains machine learning models on some training data and stores the model with the best accuracy. I want to make a docker container/image that has the best model and deploy it directly to Google Cloud or download the image to my local machine.

I looked at StackOverflow answers, Google Cloud Composer documentation and tutorials but they generally deal with running airflow inside docker or running commands inside a docker container created from an existing docker image. I want to be able to create a docker image and then download/deploy it.

I already have Dockerfile and other setup for creating docker images on my local machine. I do not know how to create a docker image on cloud composer using airflow and then download the image.

I have a task that builds a docker image.

def build_docker(ti, **context):

    import docker
    import os
    import subprocess

    # client = docker.from_env() ..........................................(1)

    docker_folder = ti.xcom_pull(
        task_ids="setup",
        key="docker_folder",
    )
    model_id = ti.xcom_pull(
        task_ids="setup",
        key="model_id",
    )
    model_path = ti.xcom_pull(
        task_ids="setup",
        key="model_path",
    )
    model_type = ti.xcom_pull(task_ids="setup", key="model_type")

    docker_image_name = f"{model_type}:{model_id}"

    os.chdir(docker_folder)
    os.system(f"cp {model_path} {os.path.join(docker_folder,'best_model')}")
    
    print(os.getcwd())

    # client.images.build(path=".", tag=docker_image_name) ................(2)
    output = subprocess.run(
        f"docker build -t {docker_image_name} .",
        shell=True,
        capture_output=True,
        encoding="utf-8",
    )
    print(output)

If I run this task on local, I can see that a docker image is made and I can create containers and run them. I cannot do the same in google cloud composer. I get the error command "docker" not found.

To bypass this, I installed docker pypi package and then uncommented line (1) and (2) but then I get the error

sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

2

Answers


  1. You can also use a PythonOperator in your Airflow task :

    • Your PythonOperator can invoke your Python ML program :
    PythonOperator(
            task_id="train_ml_model",
            op_kwargs={
                'my_param1': 'my_param1_value',
                'my_param2': 'my_param2_value'
            },
            python_callable=train_ml_model
        )
    
    def train_ml_model(my_param1, my_param2):
        # Your ML program
    
    • In PyPackages from Cloud Composer you can add all the needed Python packages for your ML program :

    enter image description here

    Login or Signup to reply.
  2. Building docker images within docker (dind) is not recommended and you don’t have access to the configuration of the containers in Composer that would allow you to run dind.

    I would recommend exporting your ML model to Cloud Storage and then running a GCP Cloud Build job to push the Docker image

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search