I have a project in webscrapping where I am trying to scrape some data from webpage. I chose a site called wykop.pl which is something like polish reddit let’s say.
The way my idea goes is that selenium opens the page, accepts cookies, closes the ad (if it pops up, it doesn’t appear 100% of the time) goes to the bottom of the page (optional, I don’t think it’s needed) and then clicks the next page button using the css selector.
This is my code
website = "https://wykop.pl/hity/roku/strona/1"
cookies_button_xpath = '''
//button[contains(@class,'qxOn2zvg e1sXLPUy')]''' #relative xpath for accepting cookies
service_chrome = Service(executable_path = chromepath)
options_chrome = webdriver.ChromeOptions()
driver_chrome = webdriver.Chrome(service = service_chrome, options = options_chrome) # otwieramy chrome
driver_chrome.maximize_window() # mazimizes browser's window
driver_chrome.get(website) # opens a website
time.sleep(3) # sometimes there can be some delays when accessing website, one can specify waiting for couple of secs
content = driver_chrome.find_element('xpath',cookies_button_xpath) # finds the button
content.click() # clicks the button
#DZIALA
#next_page_class_next = driver_chrome.find_element_by_css_selector("li.next")
#usuniete, teraz to trzeba zrobic tak
# a css selector to target the next page button with the class "next"
next_page_button_css_selector = 'next > a'
try:
# Wait for the close button of the ad to be visible
close_ad_button = WebDriverWait(driver_chrome, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button[data-v-6fdb93ea]")))
#if the ad apperas
close_ad_button.click()
except:
# If the ad doesn't appear
pass
# get us to the bottom of the page
driver_chrome.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# wait for the next page button to be clickable
next_page = WebDriverWait(driver_chrome, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_css_selector))).click()
this is the error:
---------------------------------------------------------------------------
TimeoutException Traceback (most recent call last)
Cell In[27], line 47
45 driver_chrome.execute_script("window.scrollTo(0, document.body.scrollHeight);")
46 # wait for the next page button to be clickable
---> 47 next_page = WebDriverWait(driver_chrome, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, next_page_css_selector))).click()
File ~miniconda3envsPiotrusLibsite-packagesseleniumwebdriversupportwait.py:105, in WebDriverWait.until(self, method, message)
103 if time.monotonic() > end_time:
104 break
--> 105 raise TimeoutException(message, screen, stacktrace)
TimeoutException: Message:
I have tried using the xpath solution, the problem is the same
I have tried increasing the time from 10 seconds to 30 to 50 to 70. Nothing worked.
I have tried using other variations of the css selector like
next_page_css_selector = "li.next > a
doesn’t work
I know that the problem is on my side and I know that I’m close because it accepts cookies which I took from the Xpath.
I’d really appreciate if you tried replicating the code and seeing what’s wrong
2
Answers
To get the links from different pages is easier to use their Ajax pagination API, e.g.:
Prints:
The problem is your CSS selector is not correct.
This is looking for an HTML tag NEXT that has a child A tag. There is no NEXT HTML tag on the page.
The relevant HTML is
I think what you meant is
This is looking for an HTML tag LI that has a class ‘next’ that has a child A tag. This now matches the next link at the bottom of the page.
Additional feedback:
Instead of declaring a new
WebDriverWait()
instance each time you use it, create one and reuse it. For example,I would suggest that if you are only going to use a locator once, don’t bother declaring it as a variable, e.g. instead of
just use
It keeps all your code together and makes it easier to read.
If you need to declare a locator, declare a
By
instead of just a string. Instead ofuse
That way the locator string and the type are declared in the same place making maintenance and reading/following the code easier.
Don’t use
time.sleep()
. Instead add aWebDriverWait
when you need a wait.As of Selenium 4.6, you no longer need to download and configure the driver. SeleniumManager does that for you now. This
turns into
This is just a personal preference… but name your driver
driver
, notdriver_chrome
. You aren’t maintaining multiple drivers of different types so there’s no point in putting ‘chrome’ in the name. It’s short, faster to type, etc. If you ever do change to Firefox or another browser, you’ll need to rename this variable to match, etc. Just keep it simple…