skip to Main Content

I have written the following code to login to a website. So far it simply gets the webpage, accepts cookies, but when I try to login by clicking the login button, the page hangs and the login page never loads.

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, ElementNotInteractableException


# Accept consent cookies
def accept_cookies(browser):
    try:
        browser.find_element_by_xpath('//*[@id="gdpr-banner-accept"]').click()
    except NoSuchElementException:
        print('Cookies already accepted')
        

# Webpage parameters
base_site = "https://www.ebay-kleinanzeigen.de/"

# Setup remote control browser
fireFoxOptions = webdriver.FirefoxOptions()
#fireFoxOptions.add_argument("--headless")
browser = webdriver.Firefox(executable_path = '/home/Webdriver/bin/geckodriver',firefox_options=fireFoxOptions)
browser.get(base_site)
accept_cookies(browser)

# Click login pop-up 
browser.find_elements_by_xpath("//*[contains(text(), 'Einloggen')]")[1].click()

Note: There are two login buttons (one popup & one in the page), I’ve tried both with the same result.

I have done similar with other websites, no problem. So am curious as to why it doesn’t work here.

Any thoughts on why this might be? Or how to get around this?

2

Answers


  1. I modified your code a bit adding a couple of optional arguments and on execution I got the following result:

    • Code Block:

      from selenium import webdriver
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      
      driver.get("https://www.ebay-kleinanzeigen.de/")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='gdpr-banner-accept']"))).click()
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Einloggen')]"))).click()
      
    • Observation: My observation was similar to your’s that the page hangs and the login page never loads as shown below:

    ebay-kleinanzeigen.de


    Deep Dive

    While inspecting the DOM Tree of the webpage you will find that some of the <script> and <link> tag refers to JavaScripts having keyword dist. As an example:

    • <script type="text/javascript" async="" src="/static/js/lib/node_modules/@ebayk/prebid/dist/prebid.10o55zon5xxyi.js"></script>
    • window.BelenConf.prebidFileSrc = '/static/js/lib/node_modules/@ebayk/prebid/dist/prebid.10o55zon5xxyi.js';

    This is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.


    Distil

    As per the article There Really Is Something About Distil.it…:

    Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

    Further,

    "One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


    Reference

    You can find a couple of detailed discussion in:

    Login or Signup to reply.
  2. from selenium import webdriver
    from selenium_stealth import stealth
    import time
    
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    
    # options.add_argument("--headless")
    
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r"C:UsersDIPRAJProgrammingadclick_botchromedriver.exe")
    
    stealth(driver,
            languages=["en-US", "en"],
            vendor="Google Inc.",
            platform="Win32",
            webgl_vendor="Intel Inc.",
            renderer="Intel Iris OpenGL Engine",
            fix_hairline=True,
            )
    
    url = "https://bot.sannysoft.com/"
    driver.get(url)
    time.sleep(5)
    driver.quit()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search