I am trying to learn how to scrape websites and therefore not using an API. I am trying to scrape eBay’s websites and my script will print double URL. I did my due diligence and search on Google/StackOverflow help but was unable to find any solution. Thanks in advance.
driver.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=watches&_sacat=0&_pgn=' + str(i))
soup = BeautifulSoup(driver.page_source, 'lxml')
driver.maximize_window()
tempList = []
for link in soup.find_all('a', href=True):
if 'itm' in link['href']:
print(link['href'])
tempList.append(link['href'])
Entire code: https://pastebin.com/q41eh3Q6
2
Answers
Just add the class name while searching for all the links.Hope this helps.
You’re looking for this:
Code and full example in the online IDE:
Alternatively, you can achieve the same thing by using eBay Organic Results API from SerpApi. It’s a paid API with a free plan.
The difference in your case is that you don’t have to deal with the extraction process and maintain it over time, instead, you only need to iterate over structured JSON and get the data you want.
Code to integrate:
P.S – I wrote a bit more in-depth blog post about how to scrape eBay search with Python.