I’m trying to convert htmls to pdfs in a docker container.
Dockerfile:
FROM python:3.8
# Adding trusting keys to apt for repositories
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
# Adding Google Chrome to the repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
# Updating apt to see and install Google Chrome
RUN apt-get -y update
# Magic happens
RUN apt-get install -y google-chrome-stable
COPY /requirements.txt /app/requirements.txt
WORKDIR /app
RUN pip3 install -r requirements.txt
COPY . /app
CMD [ "python3", "app.py" ]
app.py
html_file_url="file:///html_file_name.html"
pdf_file_path="pdf_file_name.pdf"
commands = [
"google-chrome",
"--headless",
"--disable-gpu",
"--no-sandbox",
"--print-to-pdf={}".format(pdf_file_path),
html_file_url,
]
subprocess.run(commands)
On running the docker file i’m getting:
[0404/112836.835631:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.835729:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[0404/112836.835786:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.866694:ERROR:sandbox_linux.cc(377)] InitializeSandbox() called with multiple threads in process gpu-process.
The pdf generated is empty. The html format contains recent css features like flexbox and is not being convert via python packages like xhtml2pdf, pdfkit etc so I’m trying to use google chrome headless.
2
Answers
Building docker image with Chrome may be tricky as some additional binaries may be required (and different for different linux flavors and Chrome versions).
Consider looking at and using as base some (popular) images with Chrome, e.g.
https://github.com/puppeteer/puppeteer/blob/main/docker/Dockerfile
This way you can learn from results of other open source projects.
Including the idea that it is better to have less RUN commands, and so less layers in the resulted image.
Forgive me If I am wrong but so they are not simply lost in several comments, here are collected snippets of related answers to several issues raised by OP question.
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
but you should research your own current google/debian version status, and latest .rpm or .deb is available from https://www.google.com/chrome/?platform=linux
"--disable-gpu"
was last used by windows systems 5 years ago and should not be needed in a modern"--headless"
Brave/Chrome/Chromium/Edge.--headless=new