Scraping eBay Sold Items using Beautiful Soup

The_Bandit
August 16, 2021
137 views
0 votes
2 Answers

I’m trying to scrape sold items on eBay. I’m trying to scrape:

https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720

Here is my code where I load in html code and convert to soup object:

    ebay_url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720'
    response = requests.get(ebay_url)

    soup = bs(response.text, 'html.parser')
    #print(soup.prettify())

I’m working on getting the titles, prices, and date sold and then loading it into a csv file. Here is the code I have for the titles:

    title = soup.find_all("h3", "s-item__title s-item__title--has-tags")
    print(title)

    listing_titles = []

    for i in range(1,len(title)):
    listing_titles.append(title[i].text)

    print(listing_titles)

Which just returns empty square braces like []. The html soup object prints correctly, and the response prints as 200. It seems that my code should work, and that finding the post price and sale date should be similar. I’m wondering if this is a job for selenium. Hopefully someone can help! Thanks!

Answers

- BhavyaParikh
- August 16, 2021 at 7:13 am
- 0 votes
0
First you can find all div based on class and loop over it get title,price and date
```
main_data=soup.find_all("div",class_="s-item__info clearfix")[1:]
for i in main_data:
    print(i.find("span",class_="POSITIVE").get_text())
    print(i.find("h3",class_="s-item__title s-item__title--has-tags").get_text())
    print(i.find("span",class_="s-item__price").get_text())
```
Output:
```
Sold  Aug 15, 2021
Oakley A Wire 2.0  Sunglasses Brushed Thick Frames Green Lenses
$185.00
...
```
Login or Signup to reply.

The response may be empty because the requests request may be blocked, since the default user-agent in the requests library is python-requests to tell the website that it is a bot or script that is sending the request. Check what user agent you have.

An additional step besides providing browser user-agent could be to rotate user-agent, for example, to switch between PC, mobile, and tablet, as well as between browsers e.g. Chrome, Firefox, Safari, Edge and so on.

It is also possible to fetch all results from all pages using pagination, the solution to this would be to use an infinite while loop and test for something (button, element) that will cause it to exit.

In our case, this is the presence of a button on the page (.pagination__next selector).

Check code in online IDE.

from bs4 import BeautifulSoup
import requests, lxml
import pandas as pd

# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
    }
    
params = {
    '_nkw': 'oakley+sunglasses',      # search query  
    'LH_Sold': '1',                   # shows sold items
    '_pgn': 1                         # page number
}

data = []

while True:
    page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(page.text, 'lxml')
    
    print(f"Extracting page: {params['_pgn']}")

    print("-" * 10)
    
    for products in soup.select(".s-item__pl-on-bottom"):
        title = products.select_one(".s-item__title span").text
        price = products.select_one(".s-item__price").text
        try:
            sold_date = products.select_one(".s-item__title--tagblock .POSITIVE").text
        except:
            sold_date = None
        
        data.append({
          "title" : title,
          "price" : price,
          "sold_date": sold_date
        })

    if soup.select_one(".pagination__next"):
        params['_pgn'] += 1
    else:
        break
    
# save to CSV (install, import pandas as pd)
pd.DataFrame(data=data).to_csv("ebay_products.csv", index=False)

Output:
file is created: "ebay_products.csv"

As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

Example code:

from serpapi import EbaySearch
import os
import pandas as pd

params = {
    "api_key": os.getenv("API_KEY"),      # serpapi key, https://serpapi.com/manage-api-key   
    "engine": "ebay",                     # search engine
    "ebay_domain": "ebay.com",            # ebay domain
    "_nkw": "oakley+sunglasses",          # search query
    "LH_Sold": "1"                        # shows sold items
}

search = EbaySearch(params)        # where data extraction happens

page_num = 0

data = []

while True:
    results = search.get_dict()     # JSON -> Python dict

    if "error" in results:
        print(results["error"])
        break
    
    for organic_result in results.get("organic_results", []):
        title = organic_result.get("title")
        price = organic_result.get("price")

        data.append({
          "title" : title,
          "price" : price
        })
                    
    page_num += 1
    print(page_num)
    
    if "next" in results.get("pagination", {}):
        params['_pgn'] += 1

    else:
        break
    
pd.DataFrame(data=data).to_csv("ebay_products.csv", index=False)

Output:
file is created: "ebay_products.csv"

There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.

Please signup or login to give your own answer.

Click here to cancel reply.