skip to Main Content

I am attempting to scrape the website basketball-reference and am running into an issue I can’t seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium

Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.

One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[@id="content"]/div[3]/div[1]')))


box = driver.find_elements_by_class_name('game_summary expanded nohover')

print (box)

driver.quit()

2

Answers


  1. Try the below code, it is working in my computer. Do let me know if you still face problem.

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome()
    driver.wait = WebDriverWait(driver, 60)
    driver.get('https://www.basketball-reference.com/boxscores/')
    driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="content"]/div[3]/div[1]')))
    
    boxes = driver.wait.until(
        EC.presence_of_all_elements_located((By.XPATH, "//div[@class="game_summary expanded nohover"]")))
    
    print("Number of Elements Located : ", len(boxes))
    
    for box in boxes:
        print(box.text)
        print("-----------")
    
    driver.quit()
    

    If it resolves your problem then please mark it as answer. Thanks

    Login or Signup to reply.
  2. Actually the site doesn’t require selenium at all. All the data is there through a simple requests (it’s just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas

    import pandas as pd
    
    dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')
    
    for idx, table in enumerate(dfs[:-2]):
        print (table)
        if (idx+1)%3 == 0:
            print("-----------")
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search