I’m having some issues deploying a selenium scraping script in render. Locally the script runs fine, when I deploy it on render and I try to access the endpoint to trigger the script this is the problem it shows:
WebDriverException
selenium.common.exceptions.WebDriverException: Message: Service /root/.cache/selenium/chromedriver/linux64/125.0.6422.60/chromedriver unexpectedly exited. Status code was: 127
The script structure is the following:
I’m going ahead and making a copy of the script, if something is unclear please let me know:
wiki_script.py :
# BeautifulSoup imports
from bs4 import BeautifulSoup
# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
# Flask imports
from flask import Flask, request
'''0. We create the flask app'''
app = Flask(__name__)
'''1. Main function'''
def main_script():
# Url to scrape
url = 'https://www.wikipedia.org/'
# Selenium parameters, headless for deploy
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=chrome_options)
# Opens the url
driver.get(url)
# Parse the url with beautifulsoup
soup = BeautifulSoup(driver.page_source, features="html.parser")
# Find the class that has the english text
lang_elements = soup.find_all(class_='central-featured-lang lang2')
# Get the 'English' text and print it from inside 'strong' attribute
strong_texts = []
for element in lang_elements:
strong_tag = element.find('strong')
if strong_tag:
strong_texts.append(strong_tag.get_text())
print(strong_texts)
return strong_texts
'''2. Configs for the API and Flask'''
@app.route('/', methods = ['GET'])
def home():
if (request.method == 'GET'):
return main_script()
if __name__=='__main__':
app.run(debug=True, host='0.0.0.0')
requirements.txt :
beautifulsoup4
selenium
Flask
webdriver-manager
packaging
gunicorn
Dockerfile:
FROM python:3.9-slim
WORKDIR /
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=5000
EXPOSE $PORT
CMD ["python", "wiki_script.py"]
I’ve tried a couple of changes in how I set up the Chrome options, but nothing seems to work really fine and I’m a little lost, any help would be appreciated.
2
Answers
After reviewing some documentation and trying a couple of fixes i've finally made it work, going to leave the final code that i was able to deploy on render without any problems.
wiki_script.py :
requirements.txt :
Dockerfile:
The main changes were made in the Dockerfile to be able to install chrome successfully and i forced the version for some libraries as seen in requirements.txt
As a driver is not installed on your docker container maybe causing the issue, Here’s a solution that downloads the compatible ChromeDriver and runs it within your Render environment using
webdriver-manager
:requirements.txt
:wiki_script.py
:Explanation:
webdriver-manager
is used to automatically download the compatible ChromeDriver version based on your Chrome version in the Doker.ChromeDriverManager().install()
retrieves the appropriate chromedriver path.Service(ChromeDriverManager().install())
configures the WebDriver to use the downloaded chromedriver.