Html - Python / BeautifulSoup get an attribute within <option>

0range
June 14, 2023
277 views
0 votes
2 Answers

I am a python / beautifulsoup newbie here.

I am trying to get an attribute value within the <option> tag. The HTML snippet is below. Specifically, I am trying to retrieve the value from the first "data-inventory-quantity (in this case, 60).

import requests
import bs4
import lxml
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
import csv


def getTitle(soup):
    return soup.find('title').text

def getInventory(soup):
  

def getPrice(soup):
    return soup.find("meta", {"property" : "og:price:amount"}).attrs['content']


urlList = []

with open('output.csv', 'w', newline='')  as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Title', 'Inventory', 'Price'])
    
    for url in urlList:
        try:
            html = urlopen(url)
        except HTTPError as e:
            print(e)
        except URLError:
            print("error")
        else:
            soup = bs4.BeautifulSoup(html.read(), 'html.parser')
            row = [getTitle(soup),  getInventory(soup), getPrice(soup)]
            print(row)
            csv_output.writerow(row)

However, as I need to run this against multiple URLs with each having a unique "value", I cannot figure out how to edit my code so that I do not need to use this specific option "value". I have tried to soup.find a higher level tag, e.g. "soup.find(‘select’, id = ‘variant-listbox’)[‘data-inventory-quantity’]" but that gives me a "KeyError: ‘data-inventory-quantity’". Is there any way to find the data-inventory-quantity when all the other attribute values within this option tag differ for each URL?

HTML:

              <option
                data-sku=""

                selected="selected"  value="40323576791107"

                  data-inventory-quantity="60"

              >
                Regular - $75.00
              </option>

              <option
                data-sku=""

                 value="40323576823875"

                  data-inventory-quantity="4"

              >
                Variant - $100.00
              </option>

          </select>
        </div>'''

Answers

I prefer to use find_all_next for get subtag in parsing via Bs4.Find every element by name and get value from data-inventory-quantity parameter.
Bellow code.

import bs4

code = ''' <div class="variants ">
              <select id="variant-listbox" name="id" class="medium">
                
                  <option
                    data-sku=""
                    
                    selected="selected"  value="40323576791107"
                    
                      data-inventory-quantity="60"
                    
                  >
                    Regular - $75.00
                  </option>
                
                  <option
                    data-sku=""
                    
                     value="40323576823875"
                    
                      data-inventory-quantity="4"
                    
                  >
                    Variant - $100.00
                  </option>
                
              </select>
            </div>'''
soup = bs4.BeautifulSoup(code, 'html.parser')
print(soup.find_all('div')[0].find_all_next('select')[0].find_all_next('option',
                                                                       {'selected': 'selected'})[0].get('data-inventory-quantity'))

Try:

from bs4 import BeautifulSoup


html_doc = '''
 <div class="variants ">
              <select id="variant-listbox" name="id" class="medium">

                  <option
                    data-sku=""

                    selected="selected"  value="40323576791107"

                      data-inventory-quantity="60"

                  >
                    Regular - $75.00
                  </option>

                  <option
                    data-sku=""

                     value="40323576823875"

                      data-inventory-quantity="4"

                  >
                    Variant - $100.00
                  </option>

              </select>
            </div>'''

soup = BeautifulSoup(html_doc, 'html.parser')

o = soup.select_one('option[data-inventory-quantity]')
print(o['data-inventory-quantity'])

Prints:

If you want to selecte the selected option:

o = soup.select_one('option[data-inventory-quantity][selected]')
print(o['data-inventory-quantity'])

EDIT: To have getInventory(soup) function:

def getInventory(soup):
    o = soup.select_one('option[data-inventory-quantity]')
    return o['data-inventory-quantity']

Please signup or login to give your own answer.

Click here to cancel reply.

Html – Python / BeautifulSoup get an attribute within <option>

Answers