skip to Main Content

Overview

I am using a proxy network and want to configure it with Selenium on Python. I have seen many post use the HOST:PORT method, but proxy networks uses the "URL method" of http://USER:PASSWORD@PROXY:PORT

SeleniumWire

I found SeleniumWire to be a way to connect the "URL method" of proxy networks to a Selenium Scraper. See basic SeleniumWire configuration:

from seleniumwire import webdriver

options = {
    'proxy':
    {
        'http': 'http://USER:PASSWORD@PROXY:PORT',
        'https': 'http://USER:PASSWORD@PROXY:PORT'
    },
}

driver = webdriver.Chrome(seleniumwire_options=options)
driver.get("https://some_url.com")

This correctly adds and cycles a proxy to the driver, however on many websites the scraper is quickly blocked by CloudFlare. This blocking is something that does not happen when running on Local IP. After searching through SeleniumWire’s GitHub Repository Issues, I found that this is caused by TLS fingerprinting and that there is no current solution to this issue.

Selenium Options

I tried to configure proxies the conventional selenium way:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument("--proxy-server=http://USER:PASSWORD@PROXY:PORT")
driver = webdriver.Chrome(options=options)
driver.get("https://some_url.com")

A browser instance does open but fails because of a network error. Browser instance does not load in established URL.


Docker Configuration

The end result of this configuration would be running python code within a docker container that is within a Lambda function. Don’t know whether or not that introduces a new level of abstraction or not.

Summary

What other resources can I use to correctly configure my Selenium scraper to use the "URL method" of IP cycling?

Versions

  • python 3.9
  • selenium 3.141.0
  • docker 20.10.11

Support Tickets

Github: https://github.com/SeleniumHQ/selenium/issues/10605

ChromeDriver: https://bugs.chromium.org/p/chromedriver/issues/detail?id=4118

2

Answers


  1. Chosen as BEST ANSWER

    Selenium Extension:

    A proxy network, or "URL" proxy, can be configured with Selenium as an extension. Create the following JS script and JSON file:

    JS script ("background.js")

    var config = {
    mode: "fixed_servers",
    rules: {
    singleProxy: {
    scheme: "http",
    host: "<PROXY>",
    port: parseInt(<PORT>)
    },
    bypassList: ["foobar.com"]
    }
    };
    chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
    function callbackFn(details) {
    return {
    authCredentials: {
    username: "<USER>",
    password: "<PASSWORD>"
    }
    };
    }
     
    chrome.webRequest.onAuthRequired.addListener(
    callbackFn,
    {urls: ["<all_urls>"]},
    ['blocking']
    );
    

    JSON File ("manifest.json")

    {
        "version": "1.0.0",
        "manifest_version": 2,
        "name": "Chrome Proxy",
        "permissions": [
            "proxy",
            "tabs",
            "unlimitedStorage",
            "storage",
            "<all_urls>",
            "webRequest",
            "webRequestBlocking"
        ],
        "background": {
            "scripts": ["background.js"]
        },
        "minimum_chrome_version":"22.0.0"
    }
    

    Zip background.js and manifest.json as proxy.zip and write the following:

    from selenium import webdriver
    
    options = webdriver.ChromeOptions()
    options.add_extension("proxy.zip")
    driver = webdriver.Chrome(options=options)
    
    driver.get("https://whatismyipaddress.com/")
    

  2. try to use this

    options.add_argument("--ignore-certificate-errors")
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search