skip to Main Content

I’m building a Python web scraper that goes through an eBay search results page (In this case ‘Gaming laptops’) and grabs the title of each item for sale. I’m using BeautifulSoup to first grab the h1 tag where each title is stored, then print it out as text:

    for item_name in soup.findAll('h1', {'class': 'it-ttl'}):
    print(item_name.text)

However, within each h1 tag with the class of ‘it-ttl’, there is also a span tag that contains some text:

<h1 class="it-ttl" itemprop="name" id="itemTitle">
 <span class="g-hdn">Details about  &nbsp;</span>
 Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
</h1>

My current program prints out both the contents of the span tag AND the item title:
My console output

Could someone explain to me how to grab just the item title while ignoring the span tag containing the “Details About” text? Thanks!

2

Answers


  1. It can be done by just removing the offending <span>:

    item = """
    <h1 class="it-ttl" itemprop="name" id="itemTitle">
     <span class="g-hdn">Details about  &nbsp;</span>
     Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
    </h1>
    """
    from bs4 import BeautifulSoup as bs
    soup = bs(item,'lxml')
    target = soup.select_one('h1')
    target.select_one('span').decompose()
    print(target.text.strip())
    

    Output:

    Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
    
    Login or Signup to reply.
  2. Another solution.

    from simplified_scrapy import SimplifiedDoc,req,utils
    html = '''
    <h1 class="it-ttl" itemprop="name" id="itemTitle">
     <span class="g-hdn">Details about  &nbsp;</span>
     Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…
    </h1>
    '''
    doc = SimplifiedDoc(html)
    item_names = doc.selects('h1.it-ttl').span.nextText()
    
    print(item_names)
    

    Result:

    ['Acer - Nitro 5 15.6" Gaming Laptop - Intel Core i5 - 8GB Memory - NVIDIA GeFo…']
    

    Here are more examples. https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search