skip to Main Content

I am trying to create an application that scrapes certain e-commerce websites. I am using Selenium for this purpose and trying to deploy my application on an ec2 instance running centos. Before deploying, I developed my code locally and it worked but it gives me errors on the remote machine.

The code that I am using

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

ser = Service(ChromeDriverManager().install())
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")

selenium_driver = webdriver.Chrome(service=ser, options=chrome_options)

url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'

selenium_driver.get(url)

title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)

When I try to run this code on remote machine I get an error with the following stacktrace

Traceback (most recent call last):
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2076, in wsgi_app
    response = self.handle_exception(e)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/ec2-user/price_tracker/flask_api.py", line 22, in home
    title, price, isSizeAvailable, shop = prices.checkInfoByShop(url, size)
  File "/home/ec2-user/price_tracker/check_prices.py", line 132, in checkInfoByShop
    secondaryPriceXPath=secondaryPriceXPath)
  File "/home/ec2-user/price_tracker/check_prices.py", line 61, in checkSelenium
    title = self.selenium_driver.find_element(By.XPATH, titleXPath)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 1246, in find_element
    'value': value})['value']
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span"}
  (Session info: headless chrome=96.0.4664.110)
Stacktrace:
#0 0x559979e8dee3 <unknown>
#1 0x55997995b608 <unknown>
#2 0x559979991aa1 <unknown>
#3 0x559979991c61 <unknown>
#4 0x5599799c4714 <unknown>
#5 0x5599799af29d <unknown>
#6 0x5599799c23bc <unknown>
#7 0x5599799af163 <unknown>
#8 0x559979984bfc <unknown>
#9 0x559979985c05 <unknown>
#10 0x559979ebfbaa <unknown>
#11 0x559979ed5651 <unknown>
#12 0x559979ec0b05 <unknown>
#13 0x559979ed6a68 <unknown>
#14 0x559979eb505f <unknown>
#15 0x559979ef1818 <unknown>
#16 0x559979ef1998 <unknown>
#17 0x559979f0ceed <unknown>
#18 0x7ff5dd53b40b <unknown>

For debugging purposes, I tried to read the entire body of the webpage using

body = selenium_driver.find_element(By.XPATH, '/html/body')
print(body.text)

which returns

"We're sorry, something has gone wrong. Please try again.nIf you continue to have trouble, please contact us at [email protected] your browser before accessing www.everlane.com.nThis process is automatic. Your browser will redirect to your requested content shortly.nPlease allow up to 5 seconds…nDebugging InformationnIP Addressn<ip-address>nRay IDn6c57184d797805a0"

I understand that my request might be getting blocked for some reason but is there a way to bypass this?

I have tried adding wait statements in the hope of landing on the redirect but nothing has worked so far.

4

Answers


  1. That message looks like the page content has been changed. So your code is working as intended. I’d have Selenium wait for an element to be visible (Read more here). If you don’t want to do that you can also wait for the page to redirect. How to do that is answered in another SO question here.

    Login or Signup to reply.
  2. I’d suggest using webdriver waits to wait for the page to load.

    wait=WebDriverWait(driver,selenium_driver)                                 
    elem=wait.until(EC.visibility_of_element_located((By.XPATH,"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span")))
    print(elem.text)
    

    Imports:

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait 
    from selenium.webdriver.support import expected_conditions as EC
    
    Login or Signup to reply.
  3. Because of the message:

    Checking your browser before accessing www.everlane.com.
    This process is automatic. Your browser will redirect to your requested content shortly.
    

    This site seems to have Cloudfare protection enabled. See the reference.

    I suggest to try selenium-stealth:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager
    from selenium_stealth import stealth
    
    ser = Service(ChromeDriverManager().install())
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(service=ser, options=options)
    
    stealth(driver,
            languages=["en-US", "en"],
            vendor="Google Inc.",
            platform="Win32",
            webgl_vendor="Intel Inc.",
            renderer="Intel Iris OpenGL Engine",
            fix_hairline=True,
            )
    
    url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'
    
    driver.get(url)
    title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
    print(title.text)
    

    Some of these repositories might be helpful:

    Or look at this topic.

    Login or Signup to reply.
  4. This message…

    "We're sorry, something has gone wrong. Please try again.nIf you continue to have trouble, please contact us at [email protected] your browser before accessing www.everlane.com.nThis process is automatic. Your browser will redirect to your requested content shortly.nPlease allow up to 5 seconds…nDebugging InformationnIP Addressn<ip-address>nRay IDn6c57184d797805a0"
    

    …implies that Selenium driven ChromeDriver initiated Browsing Context was detected as a .


    However, I was able to bypass the detection through using a few arguments as follows:

    • Code Block:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from selenium.webdriver.chrome.service import Service
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      
      options = Options()
      options.headless = True
      options.add_argument("start-maximized")
      options.add_experimental_option("excludeSwitches", ["enable-automation"])
      options.add_experimental_option('excludeSwitches', ['enable-logging'])
      options.add_experimental_option('useAutomationExtension', False)
      options.add_argument('--disable-blink-features=AutomationControlled')
      s = Service('C:\BrowserDrivers\chromedriver.exe')
      driver = webdriver.Chrome(service=s, options=options)
      driver.get("https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals")
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='product-heading__name']/span"))).text)
      driver.quit()
      
    • Console Output:

      The Cloud Cable-Knit Vest
      
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search