skip to Main Content

I have a code that grabs a price tag from this HTML section

<div class="main">
    <div class="cost-box">
        <ins><span>$</span><price>10.00</price></ins>
    </div>
</div>

Here’s the code I use to get the 10.00 price:

import requests
from bs4 import BeautifulSoup as bs

url = "https://www.sample.com/sample/123abcd"

response = requests.get(url).text
soup = bs(response, "html.parser")

container = soup.find("div", class_="cost-box")
price = container.price    # <-- get <price> tag from container
print(price.text)

The only problem though is that some pages doesn’t have prices on them and would only have something like this in their HTML:

<div class="cost-box">
   
</div>

And my code would now give an error saying

price = container.price.text
AttributeError: 'NoneType' object has no attribute 'text'

Is there a way to add some sort of a checkpoint on whether the price variable exists? Instead of having the error and the whole program coming to a stop, I just want the price to say invalid and the code would still continue (I placed it on a for loop).

3

Answers


  1. You can use if-else:

    import requests
    from bs4 import BeautifulSoup as bs
    
    # price missing:
    html_text = """
    <div class="main">
        <div class="cost-box">
    
        </div>
    </div>"""
    
    soup = bs(html_text, "html.parser")
    
    container = soup.find("div", class_="cost-box")
    price = container.price.text if container.price else "Invalid"
    print(price)
    

    Prints:

    Invalid
    
    Login or Signup to reply.
  2. Check if container exists first with if container. This avoids errors if that div is missing entirely.Then check if container.find("price") returns a result before assigning to price. If no tag is found, set price to "Invalid" instead.

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.sample.com/sample/123abcd"
    
    response = requests.get(url).text
    soup = BeautifulSoup(response, "html.parser")
    
    container = soup.find("div", class_="cost-box")
    
    if container and container.find("price"):
        price = container.find("price").text
    else:
        price = "Invalid"
    
    print(price)
    
    Login or Signup to reply.
  3. Get the text from the div tag and not the price tag, which will just return an empty string.

    from bs4 import BeautifulSoup
    
    html = """<div class="main">
        <div class="cost-box">
            <ins><span>$</span><price>10.00</price></ins>
        </div>
        <div class="cost-box">
        </div>
    </div>"""
    
    soup = BeautifulSoup(html, 'html.parser')
    cost_box = soup.findAll('div', {'class': 'cost-box'})
    
    print([cost.text.strip() for cost in cost_box]) # --> ['$10.00', '']
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search