Python Selenium won't select all image tags - SEO

z3y50n
December 24, 2020
149 views
1 vote
2 Answers

I am trying to crawl Product Hunt using Selenium

More specifically I am trying to get the source link for all the icons of the products.

HTML:

My code for crawling is the following:

driver = webdriver.Chrome("<Your driver's path>")
driver.get("https://www.producthunt.com/topics/seo-tools?order=most-upvoted")
time.sleep(4)
icons = driver.find_elements_by_css_selector("div.styles_thumbnail__d2DAK.styles_thumbnail__XBHZ_ img")
print(len(icons))
print(icons)
driver.close()

The problem is that selenium only gets the 3 first pictures and not all the products available.

I have tried increasing the sleep time as well as implemented the driver.wait way along with EC.presence_of_all_elements_located to be sure that all icons are loaded properly.

Answers

Since the other icons show when you scroll at the bottom of the page, you can do like this

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.producthunt.com/topics/seo-tools?order=most-upvoted")

expected_number_of_icons = 20

icons = []
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    icons = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@data-test, 'post-item')]//div[@class='styles_thumbnail__d2DAK styles_thumbnail__XBHZ_']//img | //div[contains(@class, 'styles_link')]//span[@class='lazyload-wrapper']/img")))
    icons = list(set(icons))
    if len(icons) > expected_number_of_icons:
        break

icons = icons[:expected_number_of_icons]
driver.close()

where you choose to stop when you reach the number of icons that you want. Obviously, for example if you reach 210 icons and you want only 200 icons you can discard the last 10 elements of the list

To print the value of the src attribute you can use either of the following Locator Strategies:

Using css_selector:

print([my_elem.get_attribute("src") for my_elem in driver.find_elements_by_css_selector("span.lazyload-wrapper > img")])

Using xpath:

print([my_elem.get_attribute("src") for my_elem in driver.find_elements_by_xpath("//span[@class='lazyload-wrapper']/img")])

Ideally you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

driver.get('https://www.producthunt.com/topics/seo-tools?order=most-upvoted')
print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.lazyload-wrapper > img")))])

Using XPATH in a single line:

driver.get('https://www.producthunt.com/topics/seo-tools?order=most-upvoted')
print([my_elem.get_attribute("src") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='lazyload-wrapper']/img")))])

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Please signup or login to give your own answer.

Click here to cancel reply.

Python Selenium won't select all image tags – SEO

Answers