Overview
I am using a proxy network and want to configure it with Selenium on Python. I have seen many post use the HOST:PORT
method, but proxy networks uses the "URL method" of http://USER:PASSWORD@PROXY:PORT
SeleniumWire
I found SeleniumWire to be a way to connect the "URL method" of proxy networks to a Selenium Scraper. See basic SeleniumWire configuration:
from seleniumwire import webdriver
options = {
'proxy':
{
'http': 'http://USER:PASSWORD@PROXY:PORT',
'https': 'http://USER:PASSWORD@PROXY:PORT'
},
}
driver = webdriver.Chrome(seleniumwire_options=options)
driver.get("https://some_url.com")
This correctly adds and cycles a proxy to the driver, however on many websites the scraper is quickly blocked by CloudFlare. This blocking is something that does not happen when running on Local IP. After searching through SeleniumWire’s GitHub Repository Issues, I found that this is caused by TLS fingerprinting and that there is no current solution to this issue.
Selenium Options
I tried to configure proxies the conventional selenium way:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--proxy-server=http://USER:PASSWORD@PROXY:PORT")
driver = webdriver.Chrome(options=options)
driver.get("https://some_url.com")
A browser instance does open but fails because of a network error. Browser instance does not load in established URL.
Docker Configuration
The end result of this configuration would be running python code within a docker container that is within a Lambda function. Don’t know whether or not that introduces a new level of abstraction or not.
Summary
What other resources can I use to correctly configure my Selenium scraper to use the "URL method" of IP cycling?
Versions
- python 3.9
- selenium 3.141.0
- docker 20.10.11
Support Tickets
Github: https://github.com/SeleniumHQ/selenium/issues/10605
ChromeDriver: https://bugs.chromium.org/p/chromedriver/issues/detail?id=4118
2
Answers
Selenium Extension:
A proxy network, or "URL" proxy, can be configured with Selenium as an extension. Create the following JS script and JSON file:
JS script ("background.js")
JSON File ("manifest.json")
Zip background.js and manifest.json as proxy.zip and write the following:
try to use this