I’m building a Python web scraper that goes through an eBay search results page (In this case ‘Gaming laptops’) and grabs the title of each item for sale. I’m using BeautifulSoup to first grab the h1 tag where each title is stored, then print it out as text:
for item_name in soup.findAll('h1', {'class': 'it-ttl'}):
print(item_name.text)
However, within each h1 tag with the class of ‘it-ttl’, there is also a span tag that contains some text:
<h1 class="it-ttl" itemprop="name" id="itemTitle">
<span class="g-hdn">Details about </span>
Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
</h1>
My current program prints out both the contents of the span tag AND the item title:
My console output
Could someone explain to me how to grab just the item title while ignoring the span tag containing the “Details About” text? Thanks!
2
Answers
It can be done by just removing the offending
<span>
:Output:
Another solution.
Result:
Here are more examples. https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples