Deploying selenium/flask/docker script in Python

cd91
May 20, 2024
217 views
0 votes
2 Answers

I’m having some issues deploying a selenium scraping script in render. Locally the script runs fine, when I deploy it on render and I try to access the endpoint to trigger the script this is the problem it shows:

WebDriverException
selenium.common.exceptions.WebDriverException: Message: Service /root/.cache/selenium/chromedriver/linux64/125.0.6422.60/chromedriver unexpectedly exited. Status code was: 127

The script structure is the following:

I’m going ahead and making a copy of the script, if something is unclear please let me know:

wiki_script.py :

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():

    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy

    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")
    
    driver = webdriver.Chrome(options=chrome_options)

    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts
    

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

requirements.txt :

beautifulsoup4
selenium
Flask
webdriver-manager
packaging
gunicorn

Dockerfile:

FROM python:3.9-slim

WORKDIR /

COPY requirements.txt requirements.txt

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV PORT=5000
EXPOSE $PORT

CMD ["python", "wiki_script.py"]

I’ve tried a couple of changes in how I set up the Chrome options, but nothing seems to work really fine and I’m a little lost, any help would be appreciated.

Answers

Chosen as BEST ANSWER

After reviewing some documentation and trying a couple of fixes i've finally made it work, going to leave the final code that i was able to deploy on render without any problems.

wiki_script.py :

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager



# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():

    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy

    chrome_options = Options()
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")
    
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)

    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts
    

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

requirements.txt :

beautifulsoup4==4.12.2
selenium==4.9.1
Flask
webdriver-manager==4.0.0
packaging
gunicorn

Dockerfile:

FROM python:3.10

WORKDIR /app

COPY . /app

RUN pip install --trusted-host pypi.install.org -r requirements.txt

RUN apt-get update && apt-get install -y wget unzip && 
    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && 
    apt-get install -y ./google-chrome-stable_current_amd64.deb && 
    rm google-chrome-stable_current_amd64.deb && 
    apt-get clean

CMD ["python", "wiki_script.py"]

The main changes were made in the Dockerfile to be able to install chrome successfully and i forced the version for some libraries as seen in requirements.txt

(Edit)

As a driver is not installed on your docker container maybe causing the issue, Here’s a solution that downloads the compatible ChromeDriver and runs it within your Render environment using webdriver-manager:

requirements.txt:

beautifulsoup4
selenium
Flask
webdriver-manager

wiki_script.py:

# webdriver_manager import
from webdriver_manager.chrome import ChromeDriverManager

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():
    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")

    # Download and configure ChromeDriver automatically
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
   
    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

Explanation:

Removed the manual configuration of the chromedriver path.
webdriver-manager is used to automatically download the compatible ChromeDriver version based on your Chrome version in the Doker.
ChromeDriverManager().install() retrieves the appropriate chromedriver path.
Service(ChromeDriverManager().install()) configures the WebDriver to use the downloaded chromedriver.

Please signup or login to give your own answer.

Click here to cancel reply.