I am a python / beautifulsoup newbie here.
I am trying to get an attribute value within the <option> tag. The HTML snippet is below. Specifically, I am trying to retrieve the value from the first "data-inventory-quantity (in this case, 60).
import requests
import bs4
import lxml
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
import csv
def getTitle(soup):
return soup.find('title').text
def getInventory(soup):
def getPrice(soup):
return soup.find("meta", {"property" : "og:price:amount"}).attrs['content']
urlList = []
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Title', 'Inventory', 'Price'])
for url in urlList:
try:
html = urlopen(url)
except HTTPError as e:
print(e)
except URLError:
print("error")
else:
soup = bs4.BeautifulSoup(html.read(), 'html.parser')
row = [getTitle(soup), getInventory(soup), getPrice(soup)]
print(row)
csv_output.writerow(row)
However, as I need to run this against multiple URLs with each having a unique "value", I cannot figure out how to edit my code so that I do not need to use this specific option "value". I have tried to soup.find a higher level tag, e.g. "soup.find(‘select’, id = ‘variant-listbox’)[‘data-inventory-quantity’]" but that gives me a "KeyError: ‘data-inventory-quantity’". Is there any way to find the data-inventory-quantity when all the other attribute values within this option tag differ for each URL?
HTML:
<option
data-sku=""
selected="selected" value="40323576791107"
data-inventory-quantity="60"
>
Regular - $75.00
</option>
<option
data-sku=""
value="40323576823875"
data-inventory-quantity="4"
>
Variant - $100.00
</option>
</select>
</div>'''
2
Answers
I prefer to use
find_all_next
for get subtag in parsing viaBs4
.Find every element by name and get value fromdata-inventory-quantity
parameter.Bellow code.
Try:
Prints:
If you want to selecte the selected option:
EDIT: To have
getInventory(soup)
function: