skip to Main Content

I tried to make a little Beautiful Soup Script, to analyze prices on eBay. So the problem is, that my soup.findAll() that should find the prices, is sometimes working, sometimes not and I am wondering why. So here is my code:

import requests
from bs4 import BeautifulSoup
from requests.models import encode_multipart_formdata


article = input("Product:")
keywords = article.strip().replace(" ", "+")
URL_s = "https://www.ebay.de/sch/i.html?_dmd=1&_fosrp=1&LH_SALE_CURRENCY=0&_sop=12&_ipg=50&LH_Complete=1&LH_Sold=1&_sadis=10&_from=R40&_sacat=0&_nkw=" + keywords + "&_dcat=139971&rt=nc&LH_ItemCondition=3"


source = requests.get(URL_s).text
soup = BeautifulSoup(source)

prices = soup.findAll('span', class_='bold bidsold')
# ^ this line sometimes finds the prices, sometimes it just produces an empty list ^

help would be very welcome, hope you are doing well, bye bye 🙂

3

Answers


  1. Maybe the prices are rendered by JavaScript. Requests does not wait for the JavaScript to be loaded.

    So thats why, you should use other modules, such as Selenium or DryScrape

    Login or Signup to reply.
  2. If you look at the variable soup, and open the results as an html page you would see something like this:

    enter image description here

    This means the ebay has some sort of a filtering mechanism to prevent scraping, and requires you to somehow confirm your identity. This is why your query for prices returns empty.

    Login or Signup to reply.
  3. When using requests, the request may be blocked because the default user-agent in the requests library is python-requests, in order for the website to understand that this is not a bot or script, you need to pass your real User-Agent to the headers.

    You can also read Reducing the chance of being blocked while web scraping blog post to learn about other options for solving this problem.

    If you want to collect all the information from all pages, you can use a while loop that dynamically paginates all pages.

    The while loop will be executed until the stop command appears, in our case, the loop termination command will be to check for the presence of the next page, for which the CSS selector “.pagination__next” is responsible.

    Check code in online IDE.

    from bs4 import BeautifulSoup
    import requests, json, lxml
    
    # https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
    }
    query = input('Your query is: ')  # "shirt" for example
    params = {
        '_nkw': query,           # search query  
        '_pgn': 1,               # page number
        'LH_Sold': '1'           # shows sold items
    }
    
    data = []
    page_limit = 10              # page limit (if you need)
    while True: 
        page = requests.get('https://www.ebay.de/sch/i.html', params=params, headers=headers, timeout=30)
        soup = BeautifulSoup(page.text, 'lxml')
        
        print(f"Extracting page: {params['_pgn']}")
    
        print("-" * 10)
        
        for products in soup.select(".s-item__info"):
            title = products.select_one(".s-item__title span").text
            price = products.select_one(".s-item__price").text
            
            data.append({
              "title" : title,
              "price" : price
            })
    
        if params['_pgn'] == page_limit:
           break
      
        if soup.select_one(".pagination__next"):
            params['_pgn'] += 1
        else:
            break
          
    print(json.dumps(data, indent=2, ensure_ascii=False))
    

    Example output:

    [
      {
        "title": "CECIL Pullover Damen Hoodie Sweatshirt Gr. L (DE 44) Baumwolle flieder #7902fa2",
        "price": "EUR 17,64"
      },
      {
        "title": "Shirt mit Schlangendruck & Strass "cyclam" Gr. 40 UVP: 49,99€ 5.65",
        "price": "EUR 6,50"
      },
      {
        "title": "Fender Guitars Herren T-Shirt von Difuzed - Größe Medium blau auf blau - Sehr guter Zustand",
        "price": "EUR 10,06"
      },
      other results ...
    ]
    

    As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

    Example code with pagination:

    from serpapi import EbaySearch
    import os, json
    
    query = input('Your query is: ')      # "shirt" for example
    
    params = {
        "api_key": "...",                 # serpapi key, https://serpapi.com/manage-api-key   
        "engine": "ebay",                 # search engine
        "ebay_domain": "ebay.com",        # ebay domain
        "_nkw": query,                    # search query
      # "LH_Sold": "1"                    # shows sold items
    }
    
    search = EbaySearch(params)        # where data extraction happens
    
    page_num = 0
    
    data = []
    
    while True:
        results = search.get_dict()     # JSON -> Python dict
    
        if "error" in results:
            print(results["error"])
            break
        
        for organic_result in results.get("organic_results", []):
            title = organic_result.get("title")
            price = organic_result.get("price")
    
            data.append({
              "price" : price,
              "title" : title
            })
                        
        page_num += 1
        print(page_num)
        
        if "next" in results.get("pagination", {}):
            params['_pgn'] += 1
    
        else:
            break
    
    print(json.dumps(data, indent=2))
    

    Output:

    [
      {
        "price": {
          "raw": "EUR 17,50",
          "extracted": 17.5
        },
        "title": "Mensch zweiter Klasse Gesund und ungeimpft T-Shirt"
      },
      {
        "price": {
          "raw": "EUR 14,90",
          "extracted": 14.9
        },
        "title": "SprĂĽche Shirt Lustige T-Shirts fĂĽr Herren oder Unisex Kult Fun Gag Handwerker"
      },
      # ...
    ]
    

    There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search