Loading web page using headless Chrome and Selenium returns Debugging Information, IP Address Ray ID - CentOS

sm1994
December 30, 2021
154 views
2 votes
4 Answers

I am trying to create an application that scrapes certain e-commerce websites. I am using Selenium for this purpose and trying to deploy my application on an ec2 instance running centos. Before deploying, I developed my code locally and it worked but it gives me errors on the remote machine.

The code that I am using

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

ser = Service(ChromeDriverManager().install())
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")

selenium_driver = webdriver.Chrome(service=ser, options=chrome_options)

url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'

selenium_driver.get(url)

title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)

When I try to run this code on remote machine I get an error with the following stacktrace

Traceback (most recent call last):
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2076, in wsgi_app
    response = self.handle_exception(e)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/ec2-user/price_tracker/flask_api.py", line 22, in home
    title, price, isSizeAvailable, shop = prices.checkInfoByShop(url, size)
  File "/home/ec2-user/price_tracker/check_prices.py", line 132, in checkInfoByShop
    secondaryPriceXPath=secondaryPriceXPath)
  File "/home/ec2-user/price_tracker/check_prices.py", line 61, in checkSelenium
    title = self.selenium_driver.find_element(By.XPATH, titleXPath)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 1246, in find_element
    'value': value})['value']
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span"}
  (Session info: headless chrome=96.0.4664.110)
Stacktrace:
#0 0x559979e8dee3 <unknown>
#1 0x55997995b608 <unknown>
#2 0x559979991aa1 <unknown>
#3 0x559979991c61 <unknown>
#4 0x5599799c4714 <unknown>
#5 0x5599799af29d <unknown>
#6 0x5599799c23bc <unknown>
#7 0x5599799af163 <unknown>
#8 0x559979984bfc <unknown>
#9 0x559979985c05 <unknown>
#10 0x559979ebfbaa <unknown>
#11 0x559979ed5651 <unknown>
#12 0x559979ec0b05 <unknown>
#13 0x559979ed6a68 <unknown>
#14 0x559979eb505f <unknown>
#15 0x559979ef1818 <unknown>
#16 0x559979ef1998 <unknown>
#17 0x559979f0ceed <unknown>
#18 0x7ff5dd53b40b <unknown>

For debugging purposes, I tried to read the entire body of the webpage using

body = selenium_driver.find_element(By.XPATH, '/html/body')
print(body.text)

which returns

"We're sorry, something has gone wrong. Please try again.nIf you continue to have trouble, please contact us at [email protected] your browser before accessing www.everlane.com.nThis process is automatic. Your browser will redirect to your requested content shortly.nPlease allow up to 5 seconds…nDebugging InformationnIP Addressn<ip-address>nRay IDn6c57184d797805a0"

I understand that my request might be getting blocked for some reason but is there a way to bypass this?

I have tried adding wait statements in the hope of landing on the redirect but nothing has worked so far.

Answers

- ConorGrocock
- December 30, 2021 at 2:02 am
- 0 votes
0
That message looks like the page content has been changed. So your code is working as intended. I’d have Selenium wait for an element to be visible (Read more here). If you don’t want to do that you can also wait for the page to redirect. How to do that is answered in another SO question here.

Login or Signup to reply.

I’d suggest using webdriver waits to wait for the page to load.

wait=WebDriverWait(driver,selenium_driver)                                 
elem=wait.until(EC.visibility_of_element_located((By.XPATH,"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span")))
print(elem.text)

Imports:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

Because of the message:

Checking your browser before accessing www.everlane.com.
This process is automatic. Your browser will redirect to your requested content shortly.

This site seems to have Cloudfare protection enabled. See the reference.

I suggest to try selenium-stealth:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium_stealth import stealth

ser = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(service=ser, options=options)

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'

driver.get(url)
title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)

Some of these repositories might be helpful:

Or look at this topic.

This message…

"We're sorry, something has gone wrong. Please try again.nIf you continue to have trouble, please contact us at [email protected] your browser before accessing www.everlane.com.nThis process is automatic. Your browser will redirect to your requested content shortly.nPlease allow up to 5 seconds…nDebugging InformationnIP Addressn<ip-address>nRay IDn6c57184d797805a0"

…implies that Selenium driven ChromeDriver initiated google-chrome Browsing Context was detected as a bot.

However, I was able to bypass the detection through google-chrome-headless using a few arguments as follows:

Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.headless = True
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\BrowserDrivers\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='product-heading__name']/span"))).text)
driver.quit()

Console Output:
```
The Cloud Cable-Knit Vest
```

Please signup or login to give your own answer.

Click here to cancel reply.

Loading web page using headless Chrome and Selenium returns Debugging Information, IP Address Ray ID – CentOS

Answers