I can scrape a title of a single ebay listing, but am having trouble scraping every single title on the page (Python/BeautifulSoup/lxml)

Nick
August 27, 2020
175 views
3 votes
4 Answers

I am trying to scrape the title of every item on an ebay page. This is the page. I first tried to scrape the title of the first listing (lines 5-7 of my code) , and I was successful as the title of the first listing gets printed. But when I try to scrape every single title on the ebay page (lines 8-10), nothing gets printed. Is there a flaw in my logic? Thanks!

1. from bs4 import BeautifulSoup
2. import requests
3. source = requests.get("https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=hippo&_sacat=0").text
4. soup = BeautifulSoup(source, "lxml")
5. listing = soup.find("li", class_=("s-item    s-item--watch-at-corner"))
6. title = soup.find("h3", class_=("s-item__title")).text
7. print(title)
8. for listing in soup.find_all("li", class_=("s-item    s-item--watch-at-corner")):
9.    title = soup.find("h3", class_=("s-item__title")).text
10.   print(title)

Answers

You’re calling find("h3", class_=("s-item__title") on the soup every time, you need to call it on every listing in the loop or it will always fetch the first title. Also, keep in mind there were a couple of hidden results on the eBay page for whatever reason, maybe check that out and see if you want to ignore or include those as well. I added enumerate function in the loop just to keep track of the number of the results.

I used this selector to find all the listing on the chrome dev tool li.s-item.s-item--watch-at-corner h3.s-item__title

from bs4 import BeautifulSoup
import requests

source = requests.get("https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=hippo&_sacat=0").text
soup = BeautifulSoup(source, "lxml")
listing = soup.find("li", class_=("s-item    s-item--watch-at-corner"))
title = soup.find("h3", class_=("s-item__title")).text
print(title)
for i, listing in enumerate(soup.find_all("li", class_=("s-item s-item--watch-at-corner"))):
    title = listing.find("h3", class_=("s-item__title")).text
    print("[{}] ".format(i) + title)

Result:

    [0] Pewter Hippopotamus Hippo  Figurine 
    [1] Hippopotamus Figurine 1.5" Gemstone Opalite Crystal Healing Carved Statue Decor 
    [2] hippopotamus coffee cafe picture animal hippo art tile gift
    [3] NEW! Miniature Bronze Hippo Figurine Miniature Bronze Statue Animal Collectible
    [4] Hippopotamus Gzhel porcelain figurine hippo handmade
    [5] Hippopotamus Gzhel porcelain figurine hippo souvenir handmade and hand-painted
....

- Ryan
- August 27, 2020 at 10:54 pm
- 0 votes
0
After a quick glance at the docs:

BeautifulSoup’s .find_all() method returns a list (as one would expect). However, it seems to me that the .find() in your for loop is just querying the response again, rather than doing something with the list you’re generating. I would expect either extracting the titles manually, such as:
title = listing['some_property']
or perhaps there’s another method provided by the library you’re using.
Login or Signup to reply.

- chost
- August 27, 2020 at 11:06 pm
- 0 votes
0
By looking at the code you haven’t checked the type of the class.
```
from bs4 import BeautifulSoup
import requests
source=requests.get("https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1313&_nkw=hippo&_sacat=0").text
soup = BeautifulSoup(source, "lxml")
listing = soup.find("li", class_=("s-item    s-item--watch-at-corner"))
title = soup.find("h3", class_=("s-item__title")).text
print(type(listing))
```
This returns the result of
```
<class 'NoneType'>
```
So the parsing ends as there are no li tags to find
Login or Signup to reply.

Have a look at the SelectorGadget Chrome extension to easily pick selectors by clicking on the desired element in your browser which is not always working perfectly, if the page heavily uses JS (in this case we can).

There is also the possibility of blocking the request, if using requests as default user-agent in requests library is a python-requests.

An additional step could be to rotate user-agent, for example, to switch between PC, mobile, and tablet, as well as between browsers e.g. Chrome, Firefox, Safari, Edge and so on.

Check out the code in the online IDE

from bs4 import BeautifulSoup
import requests, json, lxml

# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
}
   
params = {
    '_nkw': 'hippo',              # search query  
}

data = []

page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30)
soup = BeautifulSoup(page.text, 'lxml')
    
for title in soup.select(".s-item__title span"):
    if "Shop on eBay" in title:
        pass
    else:
        data.append({"title" : title.text})
 
print(json.dumps(data, indent=2, ensure_ascii=False))

Example output:

[
  {
    "title": "Hippopotamus Gzhel porcelain figurine hippo souvenir handmade and hand-painted"
  },
  {
    "title": "Coad Peru Signed 1 1/2" HIPPO Clay Pottery Collectible Figurine"
  },
  {
    "title": "Glass Hippo Hippopotamus figurine "murano" handmade"
  },
  {
    "title": "2 Hand Carved Soapstone Brown Hippopotamus Hippo Animal Figurine Paperweight"
  },
  {
    "title": "Schleich Hippo D-73527 Hippopotamus Mouth Open Wildlife Toy Figure 2012"
  },
  # ...
]

As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

Example code:

from serpapi import EbaySearch
import os, json

params = {
    "api_key": os.getenv("API_KEY"),  # serpapi api key    
    "engine": "ebay",                 # search engine
    "ebay_domain": "ebay.com",        # ebay domain
    "_nkw": "hippo"                   # search query                
}

search = EbaySearch(params)        # where data extraction happens

data = []

results = search.get_dict()     # JSON -> Python dict

for organic_result in results.get("organic_results", []):
    title = organic_result.get("title")
    
    data.append({
      "title" : title      
    })
      
print(json.dumps(data, indent=2))

Output:

[
   {
    "title": "Schleich Hippo D-73527 Hippopotamus Mouth Open Wildlife Toy Figure 2012"
  },
  {
    "title": "Vintage WOODEN SCULPTURE Hippo HIPPOPOTAMUS"
  },
  {
    "title": "Hippopotamus Gzhel porcelain figurine hippo souvenir handmade and hand-painted"
  },
  {
    "title": "Glass Hippo Hippopotamus figurine "murano" handmade"
  },
  # ...
]

Please signup or login to give your own answer.

Click here to cancel reply.