Ebay API - .findAll() finding things not constantly

Tknoobs
June 27, 2021
184 views
3 votes
3 Answers

I tried to make a little Beautiful Soup Script, to analyze prices on eBay. So the problem is, that my soup.findAll() that should find the prices, is sometimes working, sometimes not and I am wondering why. So here is my code:

import requests
from bs4 import BeautifulSoup
from requests.models import encode_multipart_formdata


article = input("Product:")
keywords = article.strip().replace(" ", "+")
URL_s = "https://www.ebay.de/sch/i.html?_dmd=1&_fosrp=1&LH_SALE_CURRENCY=0&_sop=12&_ipg=50&LH_Complete=1&LH_Sold=1&_sadis=10&_from=R40&_sacat=0&_nkw=" + keywords + "&_dcat=139971&rt=nc&LH_ItemCondition=3"


source = requests.get(URL_s).text
soup = BeautifulSoup(source)

prices = soup.findAll('span', class_='bold bidsold')
# ^ this line sometimes finds the prices, sometimes it just produces an empty list ^

help would be very welcome, hope you are doing well, bye bye 🙂

Answers

- user16325789
- June 27, 2021 at 12:38 pm
- 0 votes
0
Maybe the prices are rendered by JavaScript. Requests does not wait for the JavaScript to be loaded.

So thats why, you should use other modules, such as Selenium or DryScrape

Login or Signup to reply.

- Yogi
- June 27, 2021 at 12:45 pm
- 0 votes
0
If you look at the variable soup, and open the results as an html page you would see something like this:

This means the ebay has some sort of a filtering mechanism to prevent scraping, and requires you to somehow confirm your identity. This is why your query for prices returns empty.

Login or Signup to reply.

When using requests, the request may be blocked because the default user-agent in the requests library is python-requests, in order for the website to understand that this is not a bot or script, you need to pass your real User-Agent to the headers.

You can also read Reducing the chance of being blocked while web scraping blog post to learn about other options for solving this problem.

If you want to collect all the information from all pages, you can use a while loop that dynamically paginates all pages.

The while loop will be executed until the stop command appears, in our case, the loop termination command will be to check for the presence of the next page, for which the CSS selector “.pagination__next” is responsible.

Check code in online IDE.

from bs4 import BeautifulSoup
import requests, json, lxml

# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
}
query = input('Your query is: ')  # "shirt" for example
params = {
    '_nkw': query,           # search query  
    '_pgn': 1,               # page number
    'LH_Sold': '1'           # shows sold items
}

data = []
page_limit = 10              # page limit (if you need)
while True: 
    page = requests.get('https://www.ebay.de/sch/i.html', params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(page.text, 'lxml')
    
    print(f"Extracting page: {params['_pgn']}")

    print("-" * 10)
    
    for products in soup.select(".s-item__info"):
        title = products.select_one(".s-item__title span").text
        price = products.select_one(".s-item__price").text
        
        data.append({
          "title" : title,
          "price" : price
        })

    if params['_pgn'] == page_limit:
       break
  
    if soup.select_one(".pagination__next"):
        params['_pgn'] += 1
    else:
        break
      
print(json.dumps(data, indent=2, ensure_ascii=False))

Example output:

[
  {
    "title": "CECIL Pullover Damen Hoodie Sweatshirt Gr. L (DE 44) Baumwolle flieder #7902fa2",
    "price": "EUR 17,64"
  },
  {
    "title": "Shirt mit Schlangendruck & Strass "cyclam" Gr. 40 UVP: 49,99€ 5.65",
    "price": "EUR 6,50"
  },
  {
    "title": "Fender Guitars Herren T-Shirt von Difuzed - Größe Medium blau auf blau - Sehr guter Zustand",
    "price": "EUR 10,06"
  },
  other results ...
]

As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

Example code with pagination:

from serpapi import EbaySearch
import os, json

query = input('Your query is: ')      # "shirt" for example

params = {
    "api_key": "...",                 # serpapi key, https://serpapi.com/manage-api-key   
    "engine": "ebay",                 # search engine
    "ebay_domain": "ebay.com",        # ebay domain
    "_nkw": query,                    # search query
  # "LH_Sold": "1"                    # shows sold items
}

search = EbaySearch(params)        # where data extraction happens

page_num = 0

data = []

while True:
    results = search.get_dict()     # JSON -> Python dict

    if "error" in results:
        print(results["error"])
        break
    
    for organic_result in results.get("organic_results", []):
        title = organic_result.get("title")
        price = organic_result.get("price")

        data.append({
          "price" : price,
          "title" : title
        })
                    
    page_num += 1
    print(page_num)
    
    if "next" in results.get("pagination", {}):
        params['_pgn'] += 1

    else:
        break

print(json.dumps(data, indent=2))

Output:

[
  {
    "price": {
      "raw": "EUR 17,50",
      "extracted": 17.5
    },
    "title": "Mensch zweiter Klasse Gesund und ungeimpft T-Shirt"
  },
  {
    "price": {
      "raw": "EUR 14,90",
      "extracted": 14.9
    },
    "title": "Sprüche Shirt Lustige T-Shirts für Herren oder Unisex Kult Fun Gag Handwerker"
  },
  # ...
]

There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.

Please signup or login to give your own answer.

Click here to cancel reply.

Ebay API – .findAll() finding things not constantly

Answers