I have a set of ETL tasks that I’d like to run within Google Cloud Run jobs. There are five Python jobs I’d like to submit, namely:
- all_dividends_history.py
- all_ticker_types.py
- all_tickers.py
- all_tickers_detail.py
- all_tickers_history.py
Using this Dockefile
FROM python:3.11
RUN apt-get update -y
#RUN apt-get install -y python-pip python-dev build-essential
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
#ENTRYPOINT ["python"]
CMD ["/bin/bash"]
I am successfully able to run on my local machine using the following. The critical thing to note is that my Dockerfile only issues a CMD for bash and I’m invoking the run command followed by python {name of job}
. This allows me to use the same Dockerfile and container to execute each one of these tasks as independent jobs that can run in parallel. If possible I’d like to avoid building five separate containers.
docker run -v
$GOOGLE_APPLICATION_CREDENTIALS:/tmp/keys/new-life-400922-cd595a9f5804.json:ro
-e GOOGLE_APPLICATION_CREDENTIALS=/tmp/keys/new-life-400922-cd595a9f5804.json
-e POLYGON_API_KEY=$POLYGON_API_KEY
test python scrapers/polygon/all_ticker_types.py
I’m trying to port this over to Google Cloud run, and I from the GCP GUI I thought I’d be able to issue a command like python scrapers/polygon/all_ticker_types.py
. This is not the case, however, and instead it complains that it has no clue what to do with /app/python scrapers/polygon/all_ticker_types.py
I noticed here https://cloud.google.com/run/docs/reference/rest/v1/Container that commands are not issued inside a shell, which makes me wonder if what I’m trying to do is possible within Cloudrun. Is it possible for me to share the same Dockerfile / container for multiple scripts and call them using python {name of job}
? If so, can you help me understand what I’m doing wrong here, or what additional information would be needed to answer that? If it’s not possible / advisable to do what I’m trying to do, would you please correct me and advise me as to a better approach for this problem?
2
Answers
1. Modify Your Dockerfile:
In your Dockerfile, change the CMD instruction to use a custom entrypoint script that accepts command-line arguments. Here’s an example of how you can structure your Dockerfile:
2. Create an Entrypoint Script:
Create an entrypoint script (e.g., entrypoint.sh) in your project directory with the following content:
This entrypoint script checks the command passed as an argument and executes the corresponding Python script.
3. Build and Push the Docker Image:
Build your Docker image and push it to a container registry like Google Container Registry (GCR) or Docker Hub. Make sure to tag the image appropriately.
4. Deploy on Google Cloud Run:
After pushing the image, you can deploy it on Google Cloud Run via the GCP console or by using the gcloud run deploy command. When deploying, specify the service name, container image, and other configuration options.
5. Execute a Specific Job:
You can execute a specific job by including the desired Python script name as a command-line argument when you deploy or update the Cloud Run service. For example:
This will run the all_ticker_types.py script within your Cloud Run service.
This approach allows you to use a single Docker container and Dockerfile to run multiple Python scripts as independent jobs on Google Cloud Run, addressing your desire to avoid building five separate containers.
In your question, I understand you run jobs. So, you are talking about Cloud Run, which handle HTTP request with an HTTP server, but it seems inaccurate.
I suppose that you talk about Cloud Run jobs to be able to run your ETL jobs on a serverless platform but outside a HTTP request context.
In that context, there is a brand new features on Cloud Run Jobs named "parameter override". It allows the possibility to change the arguments and the environments variables value for a specific execution.
Based on the recommendation of Nilden to update your docker file, you can even do something simpler like that
i.e. use Python as entrypoint.
Then, in your Cloud Run Jobs parameter override execution, add the argument that you want to your "python" executable (the entrypoint), with that command
The result will run this whole command