skip to Main Content

I’m trying to scrape all the team statistics at this page:
https://www.unitedrugby.com/clubs/glasgow-warriors/stats

As you can see there are several drop down menus. The ones I’m interested in are the six ones describing the 2022/23 statistics (Attack, Defence, Kicking, Discipline, Lineouts, Scrums).

I have inspected the page and the item to click to open each of the six menus should have the following XPATH: //div[@class='bg-white px-6 py-2 absolute left-1/2 -translate-x-1/2 -top-5 text-slate-deep uppercase text-2xl leading-5 font-step-1 font-urc-sans tracking-[2px] hover:cursor-pointer select-none'].

In the Firefox inspector it also says "event" next to this particular line so (since I’m not that skilled in Selenium yet) I thought it was the element to click.

I have used the following piece of code to retrieve all elements with that class:

Elements = WebDriverWait(driver, 60).until(
       EC.element_to_be_clickable((By.XPATH, "//div[@class='bg-white px-6 py-2 absolute left-1/2 -translate-x-1/2 -top-5 text-slate-deep uppercase text-2xl leading-5 font-step-1 font-urc-sans tracking-[2px] hover:cursor-pointer select-none']"))
    )

My idea was to find all these elements, wait for them to be clickable, then click them to open the dropdown menus, and scrape all the statistics contained inside.

Regardless of how much time I allow it to wait, it always reaches a Timeout exception.

Could anyone help me sorting out this issue?

EDIT #1:

Thanks to the answers I have achieved the first step. However, my ultimate goal is to retrieve the actual statistics (e.g. "Points scored", inside "Attack").

These are all under the class flex justify-between items-center border-t border-mono-300 py-4 md:py-6.

After clicking on the cookies button and waiting for the presence of all elements (which works now) I am not able to retrieve elements with this class.

What I’m missing is how to open all those 6 menus prior to scrape the statistics, because they don’t show up unless I click on the dropdown.

I’m doing this:

Elements = [el.click() for el in Elements]

Because I’m trying to click on each of the 6 webdriver instances resulting from the previous Wait.

I think this isn’t the way I’m supposed to do it but I can’t find how, in case any of you has any hint.

3

Answers


  1. I just changed this EC.element_to_be_clickable to EC.presence_of_all_elements_located, it seems to be working.

    Check the working code below:

    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.implicitly_wait(10)
    driver.get("https://www.unitedrugby.com/clubs/glasgow-warriors/stats")
    driver.find_element(By.XPATH, "//span[text()=' ACCEPT ALL ']")
    Elements = WebDriverWait(driver, 60).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='bg-white px-6 py-2 absolute left-1/2 -translate-x-1/2 -top-5 text-slate-deep uppercase text-2xl leading-5 font-step-1 font-urc-sans tracking-[2px] hover:cursor-pointer select-none']")))
    print(len(Elements))
    

    Console output:

    6
    
    Process finished with exit code 0
    
    Login or Signup to reply.
  2. I tried to open https://www.unitedrugby.com/clubs/glasgow-warriors/stats and I see that it shows a modal cookie window that covers all the content of the site.

    So elements you are trying to reach are actually not clickable.

    You need to remove this dialog window first like this:

    WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//button/span[contains(., 'ACCEPT ALL')]"))).click()
    

    After that your code will work as expected

    Login or Signup to reply.
  3. To click to open each of the six menus you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following locator strategies:

    driver.get("https://www.unitedrugby.com/clubs/glasgow-warriors/stats")
    # accepting the cookies
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//span[normalize-space()='ACCEPT ALL']"))).click()
    # list of elements to be clicked
    elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'border-turquoise-primary')]/div[contains(@class, 'select-none')]/div/div[text()]")))
    # clicking on each element
    for element in elements:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((element))).click()
    

    Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search