I need to scrape a website that is js rendered. I found this nice working library requests_html that does the job. After using ‘pip install requests_html’ the following code will get the job done:
from requests_html import HTMLSession
url = examplesite
session = HTMLSession()
r = session.get(url)
r.html.render(sleep=1)
print(r.html.html)
The first time that it runs it will install chromium in order to render the url. However, when I try to use this code in an alpine dockerfile I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome': '/root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome'
This is probably the case because the root folder is not present in the dockerfile. So how do I install chromium in a docker container? Also, I am not limited to this library, so if there are better ones to use that can work in a docker container, please let me know.
I already tried the following, but it didn’t work:
FROM python:3.7-alpine3.13
RUN apk add --no-cache chromium --repository=http://dl-cdn.alpinelinux.org/alpine/v3.10/main
2
Answers
I fixed it using a different container:
If you want to still use
alipne
Docker image without any error, add these lines to your Dockerfile.To avoid
FileNotFoundError
on thealpine
version, manual selection of thechromium-browser
is required for thePuppeteer
.This is similar node.js example: Docker (node:8.15-alpine) + Chromium + Karma unit tests not working