I’m using requests_html module in python to render web pages dynamically. However, I’ve been facing an issue with chromium download when render method is used (see below code snippet):
response = session.get(url)
response.html.render(timeout=20)
The error shown is: "OSError: Chromium downloadable not found at https://storage.googleapis.com/chromium-browser-snapshots/Win_x64/1181205/chrome-win.zip: Received NoSuchKey
The specified key does not exist.No such object: chromium-browser-snapshots/Win_x64/1181205/chrome-win.zip"
- I’ve tried to download other versions of requests_html module but I failed.
- I tried to look out for possibilities to include custom executable path but I coudn’t find any.
Please let me know if there’s a solution to this.
2
Answers
The
requests_html
module uses pyppeteer to handle its web automation. For whatever reason, the default executable url thatpyppeteer
downloads chromium from is no longer working – it now requires a key. Normally, you’d be able to specify an executable path for pyppeteer to use but as you can see below, requests_html doesn’t give any option for you to specify your ownexecutablePath
argument topyppeteer.launch
.Part of the
BaseSession
class inrequests_html
The best thing to do would be to post a new issue on the
requests_html
github and ask them to allow you to pass your own executable path when creating a Session.Alternatively, you could switch to using
pyppeteer
directly rather than throughrequests_html
. This will give you full control over the Chromium executable path and other configuration options.Here’s a basic example of how to use pyppeteer itself to render a webpage:
In this case you’d replace
'/path/to/your/chromium'
with with path to a locally installed chrome or chromium binary. You can download one from the chromium website.Ok, under the hood, requests_html use pyppeteer… By following theses steps, you could ignore this "error"…
Download Chromium manually: Get version
1181217
from Chromium snapshots. It’s a little bit more recent but you could expect the same thing to happens.Extract to Expected Directory: Unzip it to:
Rename the folder to
1181205
(even though it’s actually version1181217
) to match whatpyppeteer
expects.requests_html
will detect the existingchrome.exe
and skip downloading.This bypasses the download error by mimicking the path
pyppeteer
expects without changing any code.Furthermore, I recommend you to not use requests_html anymore, and switch so something like
playright
, who is more recent…