skip to Main Content

I am trying to scrape the title of every item on an ebay page. This is the page. I first tried to scrape the title of the first listing (lines 5-7 of my code) , and I was successful as the title of the first listing gets printed. But when I try to scrape every single title on the ebay page (lines 8-10), nothing gets printed. Is there a flaw in my logic? Thanks!

1. from bs4 import BeautifulSoup
2. import requests
3. source = requests.get("https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=hippo&_sacat=0").text
4. soup = BeautifulSoup(source, "lxml")
5. listing = soup.find("li", class_=("s-item    s-item--watch-at-corner"))
6. title = soup.find("h3", class_=("s-item__title")).text
7. print(title)
8. for listing in soup.find_all("li", class_=("s-item    s-item--watch-at-corner")):
9.    title = soup.find("h3", class_=("s-item__title")).text
10.   print(title)

4

Answers


  1. You’re calling find("h3", class_=("s-item__title") on the soup every time, you need to call it on every listing in the loop or it will always fetch the first title. Also, keep in mind there were a couple of hidden results on the eBay page for whatever reason, maybe check that out and see if you want to ignore or include those as well. I added enumerate function in the loop just to keep track of the number of the results.

    I used this selector to find all the listing on the chrome dev tool li.s-item.s-item--watch-at-corner h3.s-item__title

    from bs4 import BeautifulSoup
    import requests
    
    source = requests.get("https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=hippo&_sacat=0").text
    soup = BeautifulSoup(source, "lxml")
    listing = soup.find("li", class_=("s-item    s-item--watch-at-corner"))
    title = soup.find("h3", class_=("s-item__title")).text
    print(title)
    for i, listing in enumerate(soup.find_all("li", class_=("s-item s-item--watch-at-corner"))):
        title = listing.find("h3", class_=("s-item__title")).text
        print("[{}] ".format(i) + title)
    

    Result:

        [0] Pewter Hippopotamus Hippo  Figurine 
        [1] Hippopotamus Figurine 1.5" Gemstone Opalite Crystal Healing Carved Statue Decor 
        [2] hippopotamus coffee cafe picture animal hippo art tile gift
        [3] NEW! Miniature Bronze Hippo Figurine Miniature Bronze Statue Animal Collectible
        [4] Hippopotamus Gzhel porcelain figurine hippo handmade
        [5] Hippopotamus Gzhel porcelain figurine hippo souvenir handmade and hand-painted
    ....
    
    Login or Signup to reply.
  2. After a quick glance at the docs:

    BeautifulSoup’s .find_all() method returns a list (as one would expect). However, it seems to me that the .find() in your for loop is just querying the response again, rather than doing something with the list you’re generating. I would expect either extracting the titles manually, such as:

    title = listing['some_property']

    or perhaps there’s another method provided by the library you’re using.

    Login or Signup to reply.
  3. By looking at the code you haven’t checked the type of the class.

    from bs4 import BeautifulSoup
    import requests
    source=requests.get("https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1313&_nkw=hippo&_sacat=0").text
    soup = BeautifulSoup(source, "lxml")
    listing = soup.find("li", class_=("s-item    s-item--watch-at-corner"))
    title = soup.find("h3", class_=("s-item__title")).text
    print(type(listing))
    

    This returns the result of

    <class 'NoneType'>
    

    So the parsing ends as there are no li tags to find

    Login or Signup to reply.
  4. Have a look at the SelectorGadget Chrome extension to easily pick selectors by clicking on the desired element in your browser which is not always working perfectly, if the page heavily uses JS (in this case we can).

    There is also the possibility of blocking the request, if using requests as default user-agent in requests library is a python-requests.

    An additional step could be to rotate user-agent, for example, to switch between PC, mobile, and tablet, as well as between browsers e.g. Chrome, Firefox, Safari, Edge and so on.

    Check out the code in the online IDE

    from bs4 import BeautifulSoup
    import requests, json, lxml
    
    # https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
    }
       
    params = {
        '_nkw': 'hippo',              # search query  
    }
    
    data = []
    
    page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(page.text, 'lxml')
        
    for title in soup.select(".s-item__title span"):
        if "Shop on eBay" in title:
            pass
        else:
            data.append({"title" : title.text})
     
    print(json.dumps(data, indent=2, ensure_ascii=False))
    

    Example output:

    [
      {
        "title": "Hippopotamus Gzhel porcelain figurine hippo souvenir handmade and hand-painted"
      },
      {
        "title": "Coad Peru Signed 1 1/2" HIPPO Clay Pottery Collectible Figurine"
      },
      {
        "title": "Glass Hippo Hippopotamus figurine "murano" handmade"
      },
      {
        "title": "2 Hand Carved Soapstone Brown Hippopotamus Hippo Animal Figurine Paperweight"
      },
      {
        "title": "Schleich Hippo D-73527 Hippopotamus Mouth Open Wildlife Toy Figure 2012"
      },
      # ...
    ]
    

    As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

    Example code:

    from serpapi import EbaySearch
    import os, json
    
    params = {
        "api_key": os.getenv("API_KEY"),  # serpapi api key    
        "engine": "ebay",                 # search engine
        "ebay_domain": "ebay.com",        # ebay domain
        "_nkw": "hippo"                   # search query                
    }
    
    search = EbaySearch(params)        # where data extraction happens
    
    data = []
    
    results = search.get_dict()     # JSON -> Python dict
    
    for organic_result in results.get("organic_results", []):
        title = organic_result.get("title")
        
        data.append({
          "title" : title      
        })
          
    print(json.dumps(data, indent=2))
    

    Output:

    [
       {
        "title": "Schleich Hippo D-73527 Hippopotamus Mouth Open Wildlife Toy Figure 2012"
      },
      {
        "title": "Vintage WOODEN SCULPTURE Hippo HIPPOPOTAMUS"
      },
      {
        "title": "Hippopotamus Gzhel porcelain figurine hippo souvenir handmade and hand-painted"
      },
      {
        "title": "Glass Hippo Hippopotamus figurine "murano" handmade"
      },
      # ...
    ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search