I am trying to crawl Product Hunt using Selenium
More specifically I am trying to get the source link for all the icons of the products.
HTML:
My code for crawling is the following:
driver = webdriver.Chrome("<Your driver's path>")
driver.get("https://www.producthunt.com/topics/seo-tools?order=most-upvoted")
time.sleep(4)
icons = driver.find_elements_by_css_selector("div.styles_thumbnail__d2DAK.styles_thumbnail__XBHZ_ img")
print(len(icons))
print(icons)
driver.close()
The problem is that selenium only gets the 3 first pictures and not all the products available.
I have tried increasing the sleep time as well as implemented the driver.wait way along with EC.presence_of_all_elements_located
to be sure that all icons are loaded properly.
2
Answers
Since the other icons show when you scroll at the bottom of the page, you can do like this
where you choose to stop when you reach the number of icons that you want. Obviously, for example if you reach 210 icons and you want only 200 icons you can discard the last 10 elements of the list
To print the value of the src attribute you can use either of the following Locator Strategies:
Using
css_selector
:Using
xpath
:Ideally you have to induce WebDriverWait for the
visibility_of_all_elements_located()
and you can use either of the following Locator Strategies:Using
CSS_SELECTOR
:Using
XPATH
in a single line:Note : You have to add the following imports :