skip to Main Content

I am a python / beautifulsoup newbie here.

I am trying to get an attribute value within the <option> tag. The HTML snippet is below. Specifically, I am trying to retrieve the value from the first "data-inventory-quantity (in this case, 60).

import requests
import bs4
import lxml
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
import csv


def getTitle(soup):
    return soup.find('title').text

def getInventory(soup):
  

def getPrice(soup):
    return soup.find("meta", {"property" : "og:price:amount"}).attrs['content']


urlList = []

with open('output.csv', 'w', newline='')  as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Title', 'Inventory', 'Price'])
    
    for url in urlList:
        try:
            html = urlopen(url)
        except HTTPError as e:
            print(e)
        except URLError:
            print("error")
        else:
            soup = bs4.BeautifulSoup(html.read(), 'html.parser')
            row = [getTitle(soup),  getInventory(soup), getPrice(soup)]
            print(row)
            csv_output.writerow(row)

However, as I need to run this against multiple URLs with each having a unique "value", I cannot figure out how to edit my code so that I do not need to use this specific option "value". I have tried to soup.find a higher level tag, e.g. "soup.find(‘select’, id = ‘variant-listbox’)[‘data-inventory-quantity’]" but that gives me a "KeyError: ‘data-inventory-quantity’". Is there any way to find the data-inventory-quantity when all the other attribute values within this option tag differ for each URL?

HTML:

              <option
                data-sku=""

                selected="selected"  value="40323576791107"

                  data-inventory-quantity="60"

              >
                Regular - $75.00
              </option>

              <option
                data-sku=""

                 value="40323576823875"

                  data-inventory-quantity="4"

              >
                Variant - $100.00
              </option>

          </select>
        </div>'''

2

Answers


  1. I prefer to use find_all_next for get subtag in parsing via Bs4.Find every element by name and get value from data-inventory-quantity parameter.
    Bellow code.

    import bs4
    
    code = ''' <div class="variants ">
                  <select id="variant-listbox" name="id" class="medium">
                    
                      <option
                        data-sku=""
                        
                        selected="selected"  value="40323576791107"
                        
                          data-inventory-quantity="60"
                        
                      >
                        Regular - $75.00
                      </option>
                    
                      <option
                        data-sku=""
                        
                         value="40323576823875"
                        
                          data-inventory-quantity="4"
                        
                      >
                        Variant - $100.00
                      </option>
                    
                  </select>
                </div>'''
    soup = bs4.BeautifulSoup(code, 'html.parser')
    print(soup.find_all('div')[0].find_all_next('select')[0].find_all_next('option',
                                                                           {'selected': 'selected'})[0].get('data-inventory-quantity'))
    
    Login or Signup to reply.
  2. Try:

    from bs4 import BeautifulSoup
    
    
    html_doc = '''
     <div class="variants ">
                  <select id="variant-listbox" name="id" class="medium">
    
                      <option
                        data-sku=""
    
                        selected="selected"  value="40323576791107"
    
                          data-inventory-quantity="60"
    
                      >
                        Regular - $75.00
                      </option>
    
                      <option
                        data-sku=""
    
                         value="40323576823875"
    
                          data-inventory-quantity="4"
    
                      >
                        Variant - $100.00
                      </option>
    
                  </select>
                </div>'''
    
    soup = BeautifulSoup(html_doc, 'html.parser')
    
    o = soup.select_one('option[data-inventory-quantity]')
    print(o['data-inventory-quantity'])
    

    Prints:

    60
    

    If you want to selecte the selected option:

    o = soup.select_one('option[data-inventory-quantity][selected]')
    print(o['data-inventory-quantity'])
    

    EDIT: To have getInventory(soup) function:

    def getInventory(soup):
        o = soup.select_one('option[data-inventory-quantity]')
        return o['data-inventory-quantity']
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search