Beautifulsoup is returning double links - Ebay API

kay
December 27, 2019
101 views
0 votes
2 Answers

I am trying to learn how to scrape websites and therefore not using an API. I am trying to scrape eBay’s websites and my script will print double URL. I did my due diligence and search on Google/StackOverflow help but was unable to find any solution. Thanks in advance.

driver.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=watches&_sacat=0&_pgn=' + str(i))
soup = BeautifulSoup(driver.page_source, 'lxml')
driver.maximize_window()

tempList = []

for link in soup.find_all('a', href=True):
    if 'itm' in link['href']:
        print(link['href'])
        tempList.append(link['href'])

Entire code: https://pastebin.com/q41eh3Q6

Tags: beautifulsoup python

Answers

Just add the class name while searching for all the links.Hope this helps.

i=1
driver.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=watches&_sacat=0&_pgn=' + str(i))
soup = BeautifulSoup(driver.page_source, 'lxml')
driver.maximize_window()

tempList = []

for link in soup.find_all('a',class_='s-item__link', href=True):
    if 'itm' in link['href']:
        print(link['href'])
        tempList.append(link['href'])

print(len(tempList))

You’re looking for this:

# container with needed data: title, link, price, condition, number of reviews, etc.
for item in soup.select('.s-item__wrapper.clearfix'):

    # only link will be extracted from the container
    link = item.select_one('.s-item__link')['href']

Code and full example in the online IDE:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

html = requests.get('https://www.ebay.com/sch/i.html?_nkw=Wathces', headers=headers).text
soup = BeautifulSoup(html, 'lxml')

temp_list = []

for item in soup.select('.s-item__wrapper.clearfix'):
    link = item.select_one('.s-item__link')['href']
    temp_list.append(link)
    print(link)

------------
'''
https://www.ebay.com/itm/203611966827?hash=item2f68380d6b:g:pBAAAOSw1~NhRy4Y
https://www.ebay.com/itm/133887696438?hash=item1f2c541e36:g:U3IAAOSwBKthN4yg
https://www.ebay.com/itm/154561925393?epid=26004285120&hash=item23fc9bd111:g:TWUAAOSwf3pgNP08
https://www.ebay.com/itm/115010872425?hash=item1ac72ea469:g:yQsAAOSweMBhT4gs
https://www.ebay.com/itm/115005461839?epid=1776383383&hash=item1ac6dc154f:g:QskAAOSwDe9hS7Ys
https://www.ebay.com/itm/224515689673?hash=item34462d8cc9:g:oTwAAOSwAO5gna8u
https://www.ebay.com/itm/124919898822?hash=item1d15ce62c6:g:iEoAAOSwhAthQnX9
https://www.ebay.com/itm/133886767671?hash=item1f2c45f237:g:htkAAOSwNAhhQOyf
https://www.ebay.com/itm/115005341920?hash=item1ac6da40e0:g:4SIAAOSwWi1hR5Mx
...
'''

Alternatively, you can achieve the same thing by using eBay Organic Results API from SerpApi. It’s a paid API with a free plan.

The difference in your case is that you don’t have to deal with the extraction process and maintain it over time, instead, you only need to iterate over structured JSON and get the data you want.

Code to integrate:

from serpapi import GoogleSearch
import os

params = {
    "engine": "ebay",
    "ebay_domain": "ebay.com",
    "_nkw": "watches",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

temp_list = []

for result in results['organic_results']:
    link = result['link']
    temp_list.append(link)
    print(link)

------------
'''
https://www.ebay.com/itm/203611966827?hash=item2f68380d6b:g:pBAAAOSw1~NhRy4Y
https://www.ebay.com/itm/133887696438?hash=item1f2c541e36:g:U3IAAOSwBKthN4yg
https://www.ebay.com/itm/154561925393?epid=26004285120&hash=item23fc9bd111:g:TWUAAOSwf3pgNP08
https://www.ebay.com/itm/115010872425?hash=item1ac72ea469:g:yQsAAOSweMBhT4gs
https://www.ebay.com/itm/115005461839?epid=1776383383&hash=item1ac6dc154f:g:QskAAOSwDe9hS7Ys
https://www.ebay.com/itm/224515689673?hash=item34462d8cc9:g:oTwAAOSwAO5gna8u
https://www.ebay.com/itm/124919898822?hash=item1d15ce62c6:g:iEoAAOSwhAthQnX9
https://www.ebay.com/itm/133886767671?hash=item1f2c45f237:g:htkAAOSwNAhhQOyf
https://www.ebay.com/itm/115005341920?hash=item1ac6da40e0:g:4SIAAOSwWi1hR5Mx
...
'''

P.S – I wrote a bit more in-depth blog post about how to scrape eBay search with Python.

Disclaimer, I work for SerpApi.

Please signup or login to give your own answer.

Click here to cancel reply.

Beautifulsoup is returning double links – Ebay API

Answers