Context
So, basically I am running a cron job (python ETL script) via a docker container. That means, every day at 12.30 am my cron job runs
docker run $IMAGE
In the Dockerfile I have the script like
# Run the script at container boot time.
CMD ["./run_manager.sh"]
This is how the “`run_manager.sh“ looks like.
python3 main.py>>main.log 2>&1
I am using the python logging
module like this
#!/usr/bin/env python3
# encoding: utf-8
"""
This file contains the script
"""
import logging
from contextlib import AbstractContextManager
import polars as pl
import tensorflow as tf
import sqlalchemy as sa
logging.basicConfig(format='%(asctime)s|%(levelname)s: %(message)s',
datefmt='%H:%M:%S, %d-%b-%Y', level=logging.INFO)
...
# Other codes
Question
Since the container is an ephemeral one that is created and destroyed every day as the cron is triggered, I have no way to access the log. So how do we change it to make the logs persist, rotate and visible outside the container? Is there a way?
Addendum
Right now it is running as a cron on an on-prem Ubuntu instance. But I am going to migrate it to google cloud scheduler very soon, keeping the design intact as much as possible. Is there any solution in that case as well, basically, to be able to see the logs of past jobs?
2
Answers
In a container you usually don’t log to a file. Since the container has an isolated filesystem, it can be tricky to extract the log file. The more common setup is to have the container log to stdout.
With what you’ve shown, the
logging
module already logs to stdout, so you just need to remove the redirection in your wrapper script. If that’s the only thing the wrapper script does, you don’t even need that; you can remove the wrapper script entirely and just havein your Dockerfile. (The script already has a correct "shebang" line to not need to explicitly use
python3
in the command line, you also need to make sure you’vechmod +x main.py
on the host system to mark it as executable. TheENV
line makes Python not capture log messages internally; also see Why doesn’t Python app print anything when run in a detached docker container?)In the form you currently show,
docker run
will print the logs directly to its own stdout. If your cron daemon is set up to email the results of cron jobs, you’ll get the logs in email. More generally, you can retrieve these logs withdocker logs
so long as the container isn’t deleted.In a cloud environment, this is the "normal" way of getting logs out of a container process. If you ran this in Kubernetes, for example, you’d use
kubectl logs
rather thandocker logs
but the underlying mechanism is still the same. I’d expect that anything capable of running a container and reporting logs will work if you log to stdout and not a file.Easy solution for on-prem
You can just bind the log file to a path in the host machine, so that the log persists in your local without getting destroyed.
In the above command replace, source
/path/to/persist/main.log
with the absolute path where you want to persist main.log.So even after the container is destroyed, the logs persist in your local host machine. If a new container is spun, it will mount the same file to the container and append logs to it.
Best solution for both on-prem and GCP
If you are moving to cloud, you can configure your application to push the logs to a remote location using a solution like fluentd.
To learn more about fluentd – https://docs.fluentd.org/language-bindings/python
Considering your use case, I would suggest you to use fluentd which is dedicated solution for logging and can be leveraged across multiple platforms.