Ebay API - Grabbing the main text content of an HTML tag without the <span> inside

lawrencejon
April 19, 2020
228 views
0 votes
2 Answers

I’m building a Python web scraper that goes through an eBay search results page (In this case ‘Gaming laptops’) and grabs the title of each item for sale. I’m using BeautifulSoup to first grab the h1 tag where each title is stored, then print it out as text:

    for item_name in soup.findAll('h1', {'class': 'it-ttl'}):
    print(item_name.text)

However, within each h1 tag with the class of ‘it-ttl’, there is also a span tag that contains some text:

<h1 class="it-ttl" itemprop="name" id="itemTitle">
 <span class="g-hdn">Details about  &nbsp;</span>
 Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
</h1>

My current program prints out both the contents of the span tag AND the item title:
My console output

Could someone explain to me how to grab just the item title while ignoring the span tag containing the “Details About” text? Thanks!

Answers

It can be done by just removing the offending <span>:

item = """
<h1 class="it-ttl" itemprop="name" id="itemTitle">
 <span class="g-hdn">Details about  &nbsp;</span>
 Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
</h1>
"""
from bs4 import BeautifulSoup as bs
soup = bs(item,'lxml')
target = soup.select_one('h1')
target.select_one('span').decompose()
print(target.text.strip())

Output:

Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…

Another solution.

from simplified_scrapy import SimplifiedDoc,req,utils
html = '''
<h1 class="it-ttl" itemprop="name" id="itemTitle">
 <span class="g-hdn">Details about  &nbsp;</span>
 Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
</h1>
'''
doc = SimplifiedDoc(html)
item_names = doc.selects('h1.it-ttl').span.nextText()

print(item_names)

Result:

['Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…']

Here are more examples. https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Please signup or login to give your own answer.

Click here to cancel reply.

Ebay API – Grabbing the main text content of an HTML tag without the <span> inside

Answers