skip to Main Content

I’m having some issues deploying a selenium scraping script in render. Locally the script runs fine, when I deploy it on render and I try to access the endpoint to trigger the script this is the problem it shows:

WebDriverException
selenium.common.exceptions.WebDriverException: Message: Service /root/.cache/selenium/chromedriver/linux64/125.0.6422.60/chromedriver unexpectedly exited. Status code was: 127

The script structure is the following:

structure

I’m going ahead and making a copy of the script, if something is unclear please let me know:

wiki_script.py :

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():

    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy

    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")
    
    driver = webdriver.Chrome(options=chrome_options)

    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts
    

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

requirements.txt :

beautifulsoup4
selenium
Flask
webdriver-manager
packaging
gunicorn

Dockerfile:

FROM python:3.9-slim

WORKDIR /

COPY requirements.txt requirements.txt

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV PORT=5000
EXPOSE $PORT

CMD ["python", "wiki_script.py"]

I’ve tried a couple of changes in how I set up the Chrome options, but nothing seems to work really fine and I’m a little lost, any help would be appreciated.

2

Answers


  1. Chosen as BEST ANSWER

    After reviewing some documentation and trying a couple of fixes i've finally made it work, going to leave the final code that i was able to deploy on render without any problems.

    wiki_script.py :

    # BeautifulSoup imports
    from bs4 import BeautifulSoup
    
    # Selenium imports
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager
    
    
    
    # Flask imports
    from flask import Flask, request
    
    
    '''0. We create the flask app'''
    app = Flask(__name__)
    
    
    '''1. Main function'''
    def main_script():
    
        # Url to scrape
        url = 'https://www.wikipedia.org/'
    
        # Selenium parameters, headless for deploy
    
        chrome_options = Options()
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-gpu")
        chrome_options.add_argument("--window-size=1920,1080")
        
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    
        # Opens the url
        driver.get(url)
    
        # Parse the url with beautifulsoup
        soup = BeautifulSoup(driver.page_source, features="html.parser")
    
        # Find the class that has the english text
        lang_elements = soup.find_all(class_='central-featured-lang lang2')
    
        # Get the 'English' text and print it from inside 'strong' attribute
        strong_texts = []
    
        for element in lang_elements:
            strong_tag = element.find('strong')
            if strong_tag:
                strong_texts.append(strong_tag.get_text())
    
        print(strong_texts)
        return strong_texts
        
    
    '''2. Configs for the API and Flask'''
    
    @app.route('/', methods = ['GET'])
    
    def home():
        if (request.method == 'GET'):
    
            return main_script()
    
    
    if __name__=='__main__':
        app.run(debug=True, host='0.0.0.0')
    

    requirements.txt :

    beautifulsoup4==4.12.2
    selenium==4.9.1
    Flask
    webdriver-manager==4.0.0
    packaging
    gunicorn
    

    Dockerfile:

    FROM python:3.10
    
    WORKDIR /app
    
    COPY . /app
    
    RUN pip install --trusted-host pypi.install.org -r requirements.txt
    
    RUN apt-get update && apt-get install -y wget unzip && 
        wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && 
        apt-get install -y ./google-chrome-stable_current_amd64.deb && 
        rm google-chrome-stable_current_amd64.deb && 
        apt-get clean
    
    CMD ["python", "wiki_script.py"]
    

    The main changes were made in the Dockerfile to be able to install chrome successfully and i forced the version for some libraries as seen in requirements.txt


  2. As a driver is not installed on your docker container maybe causing the issue, Here’s a solution that downloads the compatible ChromeDriver and runs it within your Render environment using webdriver-manager:

    requirements.txt:

    beautifulsoup4
    selenium
    Flask
    webdriver-manager
    

    wiki_script.py:

    # webdriver_manager import
    from webdriver_manager.chrome import ChromeDriverManager
    
    # BeautifulSoup imports
    from bs4 import BeautifulSoup
    
    # Selenium imports
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service
    
    # Flask imports
    from flask import Flask, request
    
    
    '''0. We create the flask app'''
    app = Flask(__name__)
    
    
    '''1. Main function'''
    def main_script():
        # Url to scrape
        url = 'https://www.wikipedia.org/'
    
        # Selenium parameters, headless for deploy
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-gpu")
        chrome_options.add_argument("--window-size=1920,1080")
    
        # Download and configure ChromeDriver automatically
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
       
        # Opens the url
        driver.get(url)
    
        # Parse the url with beautifulsoup
        soup = BeautifulSoup(driver.page_source, features="html.parser")
    
        # Find the class that has the english text
        lang_elements = soup.find_all(class_='central-featured-lang lang2')
    
        # Get the 'English' text and print it from inside 'strong' attribute
        strong_texts = []
    
        for element in lang_elements:
            strong_tag = element.find('strong')
            if strong_tag:
                strong_texts.append(strong_tag.get_text())
    
        print(strong_texts)
        return strong_texts
    
    '''2. Configs for the API and Flask'''
    
    @app.route('/', methods = ['GET'])
    
    def home():
        if (request.method == 'GET'):
    
            return main_script()
    
    
    if __name__=='__main__':
        app.run(debug=True, host='0.0.0.0')
    

    Explanation:

    • Removed the manual configuration of the chromedriver path.
    • webdriver-manager is used to automatically download the compatible ChromeDriver version based on your Chrome version in the Doker.
    • ChromeDriverManager().install() retrieves the appropriate chromedriver path.
    • Service(ChromeDriverManager().install()) configures the WebDriver to use the downloaded chromedriver.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search