skip to Main Content

Context

So, basically I am running a cron job (python ETL script) via a docker container. That means, every day at 12.30 am my cron job runs

docker run $IMAGE 

In the Dockerfile I have the script like

# Run the script at container boot time.
CMD ["./run_manager.sh"]

This is how the “`run_manager.sh“ looks like.

python3 main.py>>main.log 2>&1

I am using the python logging module like this

#!/usr/bin/env python3
# encoding: utf-8

"""
This file contains the script
"""
import logging
from contextlib import AbstractContextManager
import polars as pl
import tensorflow as tf
import sqlalchemy as sa

logging.basicConfig(format='%(asctime)s|%(levelname)s: %(message)s',
                    datefmt='%H:%M:%S, %d-%b-%Y', level=logging.INFO)

...
# Other codes

Question

Since the container is an ephemeral one that is created and destroyed every day as the cron is triggered, I have no way to access the log. So how do we change it to make the logs persist, rotate and visible outside the container? Is there a way?

Addendum

Right now it is running as a cron on an on-prem Ubuntu instance. But I am going to migrate it to google cloud scheduler very soon, keeping the design intact as much as possible. Is there any solution in that case as well, basically, to be able to see the logs of past jobs?

2

Answers


  1. In a container you usually don’t log to a file. Since the container has an isolated filesystem, it can be tricky to extract the log file. The more common setup is to have the container log to stdout.

    With what you’ve shown, the logging module already logs to stdout, so you just need to remove the redirection in your wrapper script. If that’s the only thing the wrapper script does, you don’t even need that; you can remove the wrapper script entirely and just have

    ENV PYTHONUNBUFFERED=1
    CMD ["./main.py"]
    

    in your Dockerfile. (The script already has a correct "shebang" line to not need to explicitly use python3 in the command line, you also need to make sure you’ve chmod +x main.py on the host system to mark it as executable. The ENV line makes Python not capture log messages internally; also see Why doesn’t Python app print anything when run in a detached docker container?)

    In the form you currently show, docker run will print the logs directly to its own stdout. If your cron daemon is set up to email the results of cron jobs, you’ll get the logs in email. More generally, you can retrieve these logs with docker logs so long as the container isn’t deleted.

    In a cloud environment, this is the "normal" way of getting logs out of a container process. If you ran this in Kubernetes, for example, you’d use kubectl logs rather than docker logs but the underlying mechanism is still the same. I’d expect that anything capable of running a container and reporting logs will work if you log to stdout and not a file.

    Login or Signup to reply.
  2. Easy solution for on-prem

    You can just bind the log file to a path in the host machine, so that the log persists in your local without getting destroyed.

    docker run --mount type=bind,source=/path/to/persist/main.log,target=main.log $IMAGE
    

    In the above command replace, source /path/to/persist/main.log with the absolute path where you want to persist main.log.

    So even after the container is destroyed, the logs persist in your local host machine. If a new container is spun, it will mount the same file to the container and append logs to it.

    Best solution for both on-prem and GCP

    If you are moving to cloud, you can configure your application to push the logs to a remote location using a solution like fluentd.

    To learn more about fluentd – https://docs.fluentd.org/language-bindings/python

    Considering your use case, I would suggest you to use fluentd which is dedicated solution for logging and can be leveraged across multiple platforms.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search