skip to Main Content

I have a small application that uses fast-api and playwright to scrape data and send it back to the client.
The program is working properly when I’m running it locally, but when I try to run it as a Docker image it fails with the following error:

Looks like you launched a headed browser without having a XServer running.
Set either 'headless: true' or use 'xvfb-run <your-playwright-app>' before running Playwright. 

obviously I tried running it in Headless=True mode, but the code fails with this error:

net::ERR_EMPTY_RESPONSE at https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true
logs
navigating to "https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true", 
waiting until "load"

I also tried to run it locally with Headless=True and it failed with "Timeout 30000ms exceeded" error.

This is the funcion I’m using to return the page html:

    def extract_html(self):
        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()
            page.goto('https://book.flygofirst.com/Flight/Select?inl={}&CHD={}&s=True&o1={}&d1={}&ADT={}&dd1={}&gl=0&glo=0&cc=INR&mon=true'.format(self.infants,  self.children , self.origin,  self.destination,  self.adults, self.date))
            html = page.inner_html('#sectionBody')
            return html

and this is my Dockerfile:

FROM python:3.9-slim

COPY ../../requirements/dev.txt ./

RUN python3 -m ensurepip
RUN pip install -r dev.txt
RUN playwright install 
RUN playwright install-deps 

ENV PYTHONPATH "${PYTHONPATH}:/app/"
WORKDIR /code/src

COPY ./src /app

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

Hope someone could figure out what I’m doing wrong.

2

Answers


  1. After investigating and trying several things, looks like the problem is the user_agent of the browser when is in headless mode, for some reason the default user agent does not like to that page, try with:

    def extract_html(self):
        with sync_playwright() as p:
            browser = p.chromium.launch(headless=True)
            page = browser.new_page(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36')
            page.goto('http://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true')
            html = page.inner_html('#sectionBody')
            return html
    
    Login or Signup to reply.
  2. Locally it works as there’s GUI stuff for sure already installed in order to open a browser (especially with headless=False)
    but when you’re trying to put it to Docker env additional actions required, so I’ve resolved it in this way:

    Dockerfile:

    FROM mcr.microsoft.com/playwright/python:v1.{lastest_stable_version}-focal  # in my case `30.0` 
    
    RUN apt-get update && apt-get upgrade -y
    RUN apt-get install -y xvfb
    RUN apt-get install -qqy x11-apps
    
    # chromium dependencies
    RUN apt-get install -y libnss3 
                           libxss1 
                           libasound2 
                           fonts-noto-color-emoji
    
    # additional actions related to your project
    
    ENTRYPOINT ["/bin/sh", "-c", "/usr/bin/xvfb-run -a $@", ""]  # exactly this kind of magic command :)
    

    docker-compose.yml

      service_name:
        build: . 
        init: true
        command: # command depending on a project
        environment:
          - DISPLAY=:0
        volumes:
          - /tmp/.X11-unix:/tmp/.X11-unix
    

    Hope it will help

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search