skip to Main Content

So Im doing a bit of webscraping with selenium and Ive found the class for the elements I want to be targeting. Well the problem is that it only slightly works. What i mean by this is that the call will not return every element on the page which has this class but will give some back.

 driver.find_elements(By.CLASS_NAME,"_3sf33-9rVAO_v4y0pIW_CH")

This returns only some of the elements that have the same class name but not all of them.

This is a screenshot of what I believe causes the error

The tag above shows the last tag that is found by the call and the one below is where they are no longer found.

All the divs have the same structure until the highlighted div and then they change to the structure of the highlighted div.

As you can see they both have the same class so I’m not sure what is causing this to occur.

This what I believe causes the error to originate because the divs header changes which order the keywords are listed in. This still doesn’t make sense as how could the order of the words change whether it is found or not?

I have also tried because it was a answer to a similar question.

elements = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "_3sf33-9rVAO_v4y0pIW_CH")))

This makes me believe that it has nothing to do with page loading and instead the changed order.

Also I’m sorry if my terminology or wording is incorrect, I honestly am not sure how to describe these things.

This is the minimum reproducible example:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver import ChromeOptions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = ChromeOptions()
driver = webdriver.Chrome(options=options)

driver.get('https://www.reddit.com/r/AskReddit/comments/11vps60/what_do_you_consider_a_holy_trinity/')

elements = driver.find_elements(By.CLASS_NAME,"_3sf33-9rVAO_v4y0pIW_CH")

print(len(elements))
#currently can only get 37 but should produce a number greater than a 100 or preferably ever element on the page that matches this class 
                

The link is in the code and it stops working at the comment that says "Remember: the faster you beat an encounter the more damage you mitigate, so a glare mage is just a preemptive healer".

2

Answers


  1. The issue here is that all the comments take a while to load after the page does. There’s some sort of background process that continues to load the elements. I wrote a simple method that just waits until the count of comments stabilizes and then returns the collection of elements.

    def wait_for_comments(locator)
        num_comments = 0
        while (True)
            e = driver.find_elements(locator)
            if len(e) == num_comments:
                return e
            else:
                num_comments = len(e)
            time.sleep(.5)
    

    Then your script would look like

    comments = wait_for_comments((By.CSS_SELECTOR, "._3sf33-9rVAO_v4y0pIW_CH"))
    print(len(comments))
    
    Login or Signup to reply.
  2. I guess, you need to scroll down few times to load more content,

    from time import sleep
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver import ChromeOptions
    
    options = ChromeOptions()
    
    options.add_argument("--start-maximized")
    options.add_experimental_option(
        "prefs",
        {
            "credentials_enable_service": False,
            "profile.password_manager_enabled": False,
            "profile.default_content_setting_values.notifications": 2
            # with 2 should disable/block notifications and 1 to allow
        },
    )
    
    driver = webdriver.Chrome(options=options)
    driver.get('https://www.reddit.com/r/AskReddit/comments/11vps60/what_do_you_consider_a_holy_trinity/')
    
    for _ in range(5):
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        sleep(2)
    
    elements = driver.find_elements(By.CLASS_NAME, "_3sf33-9rVAO_v4y0pIW_CH")
    print(len(elements))
    

    and the output may vary depending on the loaded content(~300):

    312
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search