skip to Main Content

I’m using requests_html module in python to render web pages dynamically. However, I’ve been facing an issue with chromium download when render method is used (see below code snippet):

response = session.get(url)
response.html.render(timeout=20)

The error shown is: "OSError: Chromium downloadable not found at https://storage.googleapis.com/chromium-browser-snapshots/Win_x64/1181205/chrome-win.zip: Received NoSuchKeyThe specified key does not exist.No such object: chromium-browser-snapshots/Win_x64/1181205/chrome-win.zip"

  1. I’ve tried to download other versions of requests_html module but I failed.
  2. I tried to look out for possibilities to include custom executable path but I coudn’t find any.

Please let me know if there’s a solution to this.

2

Answers


  1. The requests_html module uses pyppeteer to handle its web automation. For whatever reason, the default executable url that pyppeteer downloads chromium from is no longer working – it now requires a key. Normally, you’d be able to specify an executable path for pyppeteer to use but as you can see below, requests_html doesn’t give any option for you to specify your own executablePath argument to pyppeteer.launch.

    Part of the BaseSession class in requests_html

    class BaseSession(requests.Session):
        @property
        async def browser(self):
            if not hasattr(self, "_browser"):
                self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args)
    
            return self._browser
    

    The best thing to do would be to post a new issue on the requests_html github and ask them to allow you to pass your own executable path when creating a Session.

    Alternatively, you could switch to using pyppeteer directly rather than through requests_html. This will give you full control over the Chromium executable path and other configuration options.

    Here’s a basic example of how to use pyppeteer itself to render a webpage:

    import asyncio
    from pyppeteer import launch
    
    async def render_page(url):
    
        browser = await launch(
            headless=True,
            executablePath='/path/to/your/chromium')
    
        page = await browser.newPage()
        await page.goto(url)
    
        content = await page.content()
        await browser.close()
    
        return content
    
    # Run the function and fetch the page content
    rendered_html = asyncio.get_event_loop().run_until_complete(
        render_page("https://www.google.com"))
    print(rendered_html)
    

    In this case you’d replace '/path/to/your/chromium' with with path to a locally installed chrome or chromium binary. You can download one from the chromium website.

    Login or Signup to reply.
  2. Ok, under the hood, requests_html use pyppeteer… By following theses steps, you could ignore this "error"…

    1. Download Chromium manually: Get version 1181217 from Chromium snapshots. It’s a little bit more recent but you could expect the same thing to happens.

    2. Extract to Expected Directory: Unzip it to:

      %USERPROFILE%/AppData/Local/pyppeteer/pyppeteer/local-chromium/1181205/chrome-win/
      

      Rename the folder to 1181205 (even though it’s actually version 1181217) to match what pyppeteer expects.

    3. requests_html will detect the existing chrome.exe and skip downloading.

    This bypasses the download error by mimicking the path pyppeteer expects without changing any code.

    Furthermore, I recommend you to not use requests_html anymore, and switch so something like playright, who is more recent…

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search