I have airflow dags running on Google Cloud Composer that trains machine learning models on some training data and stores the model with the best accuracy. I want to make a docker container/image that has the best model and deploy it directly to Google Cloud or download the image to my local machine.
I looked at StackOverflow answers, Google Cloud Composer documentation and tutorials but they generally deal with running airflow inside docker or running commands inside a docker container created from an existing docker image. I want to be able to create a docker image and then download/deploy it.
I already have Dockerfile and other setup for creating docker images on my local machine. I do not know how to create a docker image on cloud composer using airflow and then download the image.
I have a task that builds a docker image.
def build_docker(ti, **context):
import docker
import os
import subprocess
# client = docker.from_env() ..........................................(1)
docker_folder = ti.xcom_pull(
task_ids="setup",
key="docker_folder",
)
model_id = ti.xcom_pull(
task_ids="setup",
key="model_id",
)
model_path = ti.xcom_pull(
task_ids="setup",
key="model_path",
)
model_type = ti.xcom_pull(task_ids="setup", key="model_type")
docker_image_name = f"{model_type}:{model_id}"
os.chdir(docker_folder)
os.system(f"cp {model_path} {os.path.join(docker_folder,'best_model')}")
print(os.getcwd())
# client.images.build(path=".", tag=docker_image_name) ................(2)
output = subprocess.run(
f"docker build -t {docker_image_name} .",
shell=True,
capture_output=True,
encoding="utf-8",
)
print(output)
If I run this task on local, I can see that a docker image is made and I can create containers and run them. I cannot do the same in google cloud composer. I get the error command "docker" not found.
To bypass this, I installed docker pypi package and then uncommented line (1) and (2) but then I get the error
sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory
2
Answers
You can also use a
PythonOperator
in your Airflow task :PythonOperator
can invoke yourPython
ML
program :PyPackages
fromCloud Composer
you can add all the neededPython
packages for yourML
program :Building docker images within docker (dind) is not recommended and you don’t have access to the configuration of the containers in Composer that would allow you to run dind.
I would recommend exporting your ML model to Cloud Storage and then running a GCP Cloud Build job to push the Docker image