skip to Main Content

I am trying to learn how to scrape websites and therefore not using an API. I am trying to scrape eBay’s websites and my script will print double URL. I did my due diligence and search on Google/StackOverflow help but was unable to find any solution. Thanks in advance.

driver.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=watches&_sacat=0&_pgn=' + str(i))
soup = BeautifulSoup(driver.page_source, 'lxml')
driver.maximize_window()

tempList = []

for link in soup.find_all('a', href=True):
    if 'itm' in link['href']:
        print(link['href'])
        tempList.append(link['href'])

Entire code: https://pastebin.com/q41eh3Q6

2

Answers


  1. Just add the class name while searching for all the links.Hope this helps.

    i=1
    driver.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=watches&_sacat=0&_pgn=' + str(i))
    soup = BeautifulSoup(driver.page_source, 'lxml')
    driver.maximize_window()
    
    tempList = []
    
    for link in soup.find_all('a',class_='s-item__link', href=True):
        if 'itm' in link['href']:
            print(link['href'])
            tempList.append(link['href'])
    
    print(len(tempList))
    
    Login or Signup to reply.
  2. You’re looking for this:

    # container with needed data: title, link, price, condition, number of reviews, etc.
    for item in soup.select('.s-item__wrapper.clearfix'):
    
        # only link will be extracted from the container
        link = item.select_one('.s-item__link')['href']
    

    Code and full example in the online IDE:

    from bs4 import BeautifulSoup
    import requests, lxml
    
    headers = {
        "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    html = requests.get('https://www.ebay.com/sch/i.html?_nkw=Wathces', headers=headers).text
    soup = BeautifulSoup(html, 'lxml')
    
    temp_list = []
    
    for item in soup.select('.s-item__wrapper.clearfix'):
        link = item.select_one('.s-item__link')['href']
        temp_list.append(link)
        print(link)
    
    ------------
    '''
    https://www.ebay.com/itm/203611966827?hash=item2f68380d6b:g:pBAAAOSw1~NhRy4Y
    https://www.ebay.com/itm/133887696438?hash=item1f2c541e36:g:U3IAAOSwBKthN4yg
    https://www.ebay.com/itm/154561925393?epid=26004285120&hash=item23fc9bd111:g:TWUAAOSwf3pgNP08
    https://www.ebay.com/itm/115010872425?hash=item1ac72ea469:g:yQsAAOSweMBhT4gs
    https://www.ebay.com/itm/115005461839?epid=1776383383&hash=item1ac6dc154f:g:QskAAOSwDe9hS7Ys
    https://www.ebay.com/itm/224515689673?hash=item34462d8cc9:g:oTwAAOSwAO5gna8u
    https://www.ebay.com/itm/124919898822?hash=item1d15ce62c6:g:iEoAAOSwhAthQnX9
    https://www.ebay.com/itm/133886767671?hash=item1f2c45f237:g:htkAAOSwNAhhQOyf
    https://www.ebay.com/itm/115005341920?hash=item1ac6da40e0:g:4SIAAOSwWi1hR5Mx
    ...
    '''
    

    Alternatively, you can achieve the same thing by using eBay Organic Results API from SerpApi. It’s a paid API with a free plan.

    The difference in your case is that you don’t have to deal with the extraction process and maintain it over time, instead, you only need to iterate over structured JSON and get the data you want.

    Code to integrate:

    from serpapi import GoogleSearch
    import os
    
    params = {
        "engine": "ebay",
        "ebay_domain": "ebay.com",
        "_nkw": "watches",
        "api_key": os.getenv("API_KEY"),
    }
    
    search = GoogleSearch(params)
    results = search.get_dict()
    
    temp_list = []
    
    for result in results['organic_results']:
        link = result['link']
        temp_list.append(link)
        print(link)
    
    ------------
    '''
    https://www.ebay.com/itm/203611966827?hash=item2f68380d6b:g:pBAAAOSw1~NhRy4Y
    https://www.ebay.com/itm/133887696438?hash=item1f2c541e36:g:U3IAAOSwBKthN4yg
    https://www.ebay.com/itm/154561925393?epid=26004285120&hash=item23fc9bd111:g:TWUAAOSwf3pgNP08
    https://www.ebay.com/itm/115010872425?hash=item1ac72ea469:g:yQsAAOSweMBhT4gs
    https://www.ebay.com/itm/115005461839?epid=1776383383&hash=item1ac6dc154f:g:QskAAOSwDe9hS7Ys
    https://www.ebay.com/itm/224515689673?hash=item34462d8cc9:g:oTwAAOSwAO5gna8u
    https://www.ebay.com/itm/124919898822?hash=item1d15ce62c6:g:iEoAAOSwhAthQnX9
    https://www.ebay.com/itm/133886767671?hash=item1f2c45f237:g:htkAAOSwNAhhQOyf
    https://www.ebay.com/itm/115005341920?hash=item1ac6da40e0:g:4SIAAOSwWi1hR5Mx
    ...
    '''
    

    P.S – I wrote a bit more in-depth blog post about how to scrape eBay search with Python.

    Disclaimer, I work for SerpApi.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search