skip to Main Content

I’m trying to convert htmls to pdfs in a docker container.
Dockerfile:

FROM python:3.8

# Adding trusting keys to apt for repositories
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -

# Adding Google Chrome to the repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'

# Updating apt to see and install Google Chrome
RUN apt-get -y update

# Magic happens
RUN apt-get install -y google-chrome-stable

COPY /requirements.txt /app/requirements.txt

WORKDIR /app

RUN pip3 install -r requirements.txt

COPY . /app

CMD [ "python3", "app.py" ]

app.py

html_file_url="file:///html_file_name.html"
pdf_file_path="pdf_file_name.pdf"
commands = [
            "google-chrome",
            "--headless",
            "--disable-gpu",
            "--no-sandbox",
            "--print-to-pdf={}".format(pdf_file_path),
            html_file_url,
        ]
        subprocess.run(commands)

On running the docker file i’m getting:

[0404/112836.835631:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.835729:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[0404/112836.835786:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.866694:ERROR:sandbox_linux.cc(377)] InitializeSandbox() called with multiple threads in process gpu-process.

The pdf generated is empty. The html format contains recent css features like flexbox and is not being convert via python packages like xhtml2pdf, pdfkit etc so I’m trying to use google chrome headless.

2

Answers


  1. Building docker image with Chrome may be tricky as some additional binaries may be required (and different for different linux flavors and Chrome versions).

    Consider looking at and using as base some (popular) images with Chrome, e.g.
    https://github.com/puppeteer/puppeteer/blob/main/docker/Dockerfile

    FROM node:18
    
    # Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
    # Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
    # installs, work.
    RUN apt-get update 
        && apt-get install -y wget gnupg 
        && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor -o /usr/share/keyrings/googlechrome-linux-keyring.gpg 
        && sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrome-linux-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' 
        && apt-get update 
        && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros fonts-kacst fonts-freefont-ttf libxss1 
          --no-install-recommends 
        && rm -rf /var/lib/apt/lists/* 
        && groupadd -r pptruser && useradd -rm -g pptruser -G audio,video pptruser
    
    USER pptruser
    
    WORKDIR /home/pptruser
    

    This way you can learn from results of other open source projects.
    Including the idea that it is better to have less RUN commands, and so less layers in the resulted image.

    Login or Signup to reply.
  2. Forgive me If I am wrong but so they are not simply lost in several comments, here are collected snippets of related answers to several issues raised by OP question.

      1. Debian did not have a native chrome package so need specific download to get installed currently that’s for a recent 10/11 system
        wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
        but you should research your own current google/debian version status, and latest .rpm or .deb is available from https://www.google.com/chrome/?platform=linux
      1. the very old command "--disable-gpu" was last used by windows systems 5 years ago and should not be needed in a modern "--headless" Brave/Chrome/Chromium/Edge.
      1. Chrome have rewritten some headless commands since OP question so current combinations may be different with --headless=new
      1. –headless can be exceptionally pedantic about input and output file locations/access rights/writes, and will simply draw a blank without much warning, if either one of two is wrong, or a default write to profile folder was attempted.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search