I have a classic "it works on my machine" problem, a web scraper I ran successfully on my laptop, but with a persistent error whenever I try and run it in a container.
My minimal reproducible dockerized example consists of the following files:
requirements.txt:
selenium==4.23.1 # 4.23.1
pandas==2.2.2
pandas-gbq==0.22.0
tqdm==4.66.2
Dockerfile:
FROM selenium/standalone-chrome:latest
# Set the working directory in the container
WORKDIR /usr/src/app
# Copy your application files
COPY . .
# Install Python and pip
USER root
RUN apt-get update && apt-get install -y python3 python3-pip python3-venv
# Create a virtual environment
RUN python3 -m venv /usr/src/app/venv
# Activate the virtual environment and install dependencies
RUN . /usr/src/app/venv/bin/activate &&
pip install --no-cache-dir -r requirements.txt
# Switch back to the selenium user
USER seluser
# Set the entrypoint to activate the venv and run your script
CMD ["/bin/bash", "-c", "source /usr/src/app/venv/bin/activate && python -m scrape_ev_files"]
scrape_ev_files.py (slimmed down to just what’s needed to repro error):
import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
def init_driver(local_download_path):
os.makedirs(local_download_path, exist_ok=True)
# Set Chrome Options
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--remote-debugging-port=9222")
prefs = {
"download.default_directory": local_download_path,
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
}
chrome_options.add_experimental_option("prefs", prefs)
# Set up the driver
service = Service()
chrome_options = Options()
driver = webdriver.Chrome(service=service, options=chrome_options)
# Set download behavior
driver.execute_cdp_cmd("Page.setDownloadBehavior", {
"behavior": "allow",
"downloadPath": local_download_path
})
return driver
if __name__ == "__main__":
# PARAMS
ELECTION = '2024 MARCH 5TH DEMOCRATIC PRIMARY'
ORIGIN_URL = "https://earlyvoting.texas-election.com/Elections/getElectionDetails.do"
CSV_DL_DIR = "downloaded_files"
# initialize the driver
driver = init_driver(local_download_path=CSV_DL_DIR)
shell command to reproduce the error:
docker build -t my_scraper . # (no error)
docker run --rm -t my_scraper # (error)
stacktrace from error is below. Any help would be much appreciated! I’ve tried many iterations of my requirements.txt and Dockerfile attempting to fix this, but this error at this spot has been frustratingly persistent:
File "/workspace/scrape_ev_files.py", line 110, in <module>
driver = init_driver(local_download_path=CSV_DL_DIR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/scrape_ev_files.py", line 47, in init_driver
driver = webdriver.Chrome(service=service, options=chrome_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
super().__init__(
File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 66, in __init__
super().__init__(command_executor=executor, options=options)
File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 212, in __init__
self.start_session(capabilities)
File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 299, in start_session
response = self.execute(Command.NEW_SESSION, caps)["value"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 354, in execute
self.error_handler.check_response(response)
File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
(session not created: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
2
Answers
I’m not sure if this is the problem, but there’s certainly an issue with your python code.
In this code, you repeated the
chrome_options
line:Again, I’m not sure if this is the problem, but removing it may clear you of future trouble.
The error you’re encountering is commonly caused by issues when running Chrome in a Docker container. Below are short and long answers (long has more details).
Short Answer
To fix the
SessionNotCreatedException
error with Selenium Chrome in Docker:Use Correct Chrome Options:
Increase Shared Memory: Run the Docker container with increased shared memory.
Check Docker Resources: Ensure Docker has sufficient memory and CPU resources allocated, especially on Docker Desktop.
Add Debugging Flags: Enable additional logging for more insights.
These steps should help resolve issues with running Selenium Chrome in a Docker container.
Long Answer
You can resolve the issue by running Selenium Chrome in a Docker container, ensuring all dependencies are installed, configuring Chrome with the appropriate options, increasing shared memory, and adding more debugging information.
Solution Steps
Ensure All Chrome Dependencies Are Installed
The Docker image you’re using (
selenium/standalone-chrome
) should already include the necessary dependencies, but sometimes you may need to install additional libraries.However, since you’re using the
selenium/standalone-chrome
image, it should already be configured correctly. Therefore, you shouldn’t need to install additional packages beyond what you’ve already included.Set Chrome Options Appropriately
Ensure that your Chrome options are configured correctly for a Docker environment. You’re already using the following flags, which are good practices:
--headless
: Run Chrome in headless mode (without a GUI).--no-sandbox
: Disable the sandbox for security reasons, which is often necessary in Docker.--disable-dev-shm-usage
: Avoid using/dev/shm
, which may have limited space in Docker containers.--remote-debugging-port=9222
: Enables remote debugging, which is necessary for ChromeDriver to communicate with Chrome.Here’s a consolidated version of your
init_driver
function:Increase Shared Memory Allocation
The
--disable-dev-shm-usage
flag reduces the likelihood of shared memory issues, but you can further mitigate this by increasing the shared memory size allocated to the Docker container.Run your Docker container with a larger shared memory allocation.
Check for Docker-Specific Issues
Ensure that Docker has adequate permissions and resources on your host machine, especially if you’re running on macOS or Windows with Docker Desktop. Sometimes, insufficient memory or CPU allocations to Docker can cause Chrome to crash.
Review the Docker Image
Double-check that the
selenium/standalone-chrome
image best fits your use case. If the headless configuration fails, another image, such asselenium/standalone-chrome-debug
, might provide more insights.Log Additional Debug Information
You can increase verbosity in Chrome by adding more debugging arguments, such as
--enable-logging
and--v=1
, which might help you diagnose the issue further.