skip to Main Content

I’m trying to scrape sold items on eBay. I’m trying to scrape:

https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720

Here is my code where I load in html code and convert to soup object:

    ebay_url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720'
    response = requests.get(ebay_url)

    soup = bs(response.text, 'html.parser')
    #print(soup.prettify())

I’m working on getting the titles, prices, and date sold and then loading it into a csv file. Here is the code I have for the titles:

    title = soup.find_all("h3", "s-item__title s-item__title--has-tags")
    print(title)

    listing_titles = []

    for i in range(1,len(title)):
    listing_titles.append(title[i].text)

    print(listing_titles)

Which just returns empty square braces like []. The html soup object prints correctly, and the response prints as 200. It seems that my code should work, and that finding the post price and sale date should be similar. I’m wondering if this is a job for selenium. Hopefully someone can help! Thanks!

2

Answers


  1. First you can find all div based on class and loop over it get title,price and date

    main_data=soup.find_all("div",class_="s-item__info clearfix")[1:]
    for i in main_data:
        print(i.find("span",class_="POSITIVE").get_text())
        print(i.find("h3",class_="s-item__title s-item__title--has-tags").get_text())
        print(i.find("span",class_="s-item__price").get_text())
    

    Output:

    Sold  Aug 15, 2021
    Oakley A Wire 2.0  Sunglasses Brushed Thick Frames Green Lenses
    $185.00
    ...
    
    Login or Signup to reply.
  2. The response may be empty because the requests request may be blocked, since the default user-agent in the requests library is python-requests to tell the website that it is a bot or script that is sending the request. Check what user agent you have.

    An additional step besides providing browser user-agent could be to rotate user-agent, for example, to switch between PC, mobile, and tablet, as well as between browsers e.g. Chrome, Firefox, Safari, Edge and so on.

    It is also possible to fetch all results from all pages using pagination, the solution to this would be to use an infinite while loop and test for something (button, element) that will cause it to exit.

    In our case, this is the presence of a button on the page (.pagination__next selector).

    Check code in online IDE.

    from bs4 import BeautifulSoup
    import requests, lxml
    import pandas as pd
    
    # https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
        }
        
    params = {
        '_nkw': 'oakley+sunglasses',      # search query  
        'LH_Sold': '1',                   # shows sold items
        '_pgn': 1                         # page number
    }
    
    data = []
    
    while True:
        page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30)
        soup = BeautifulSoup(page.text, 'lxml')
        
        print(f"Extracting page: {params['_pgn']}")
    
        print("-" * 10)
        
        for products in soup.select(".s-item__pl-on-bottom"):
            title = products.select_one(".s-item__title span").text
            price = products.select_one(".s-item__price").text
            try:
                sold_date = products.select_one(".s-item__title--tagblock .POSITIVE").text
            except:
                sold_date = None
            
            data.append({
              "title" : title,
              "price" : price,
              "sold_date": sold_date
            })
    
        if soup.select_one(".pagination__next"):
            params['_pgn'] += 1
        else:
            break
        
    # save to CSV (install, import pandas as pd)
    pd.DataFrame(data=data).to_csv("ebay_products.csv", index=False)
    

    Output:
    file is created: "ebay_products.csv"


    As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

    Example code:

    from serpapi import EbaySearch
    import os
    import pandas as pd
    
    params = {
        "api_key": os.getenv("API_KEY"),      # serpapi key, https://serpapi.com/manage-api-key   
        "engine": "ebay",                     # search engine
        "ebay_domain": "ebay.com",            # ebay domain
        "_nkw": "oakley+sunglasses",          # search query
        "LH_Sold": "1"                        # shows sold items
    }
    
    search = EbaySearch(params)        # where data extraction happens
    
    page_num = 0
    
    data = []
    
    while True:
        results = search.get_dict()     # JSON -> Python dict
    
        if "error" in results:
            print(results["error"])
            break
        
        for organic_result in results.get("organic_results", []):
            title = organic_result.get("title")
            price = organic_result.get("price")
    
            data.append({
              "title" : title,
              "price" : price
            })
                        
        page_num += 1
        print(page_num)
        
        if "next" in results.get("pagination", {}):
            params['_pgn'] += 1
    
        else:
            break
        
    pd.DataFrame(data=data).to_csv("ebay_products.csv", index=False)
    

    Output:
    file is created: "ebay_products.csv"

    There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search