I’m web scraping a bunch of heights for listed athletes. I have written the code to get the heights but after inspecting element, I noticed that under text the height is written in feet, but in "data-sort" that height is listed in inches. Both of these are in the td tag in class "heights". However when I use "get_text()" or .text to remove the html elements it only prints out the height in feet and removes the hidden height in inches. Is there a way I can get the height listed in inches because that will make it easier to the do math.
Here is an example of what I’m web scraping, I want remove everything and only get the height in inches which will be [79,85,74… in this case.
<td class="height" data-sort="79">6-7</td>
<td class="height" data-sort="85">7-1</td>
<td class="height" data-sort="74">6-2</td>
#This is my code
from bs4 import BeautifulSoup
import requests
urls=['https://goduke.com/sports/mens-basketball/roster']
ListData=[]
for x in range(len(urls)):
page=requests.get(urls[x]).text
pagesoup=BeautifulSoup(page,'html.parser')
h=pagesoup.find_all('td', class_="height")
ListData.append(h)
NewList=[]
for b in range(len(ListData)):
new=[]
for x in ListData[b]:
print(x.text)
2
Answers
If you use css selector you can simply pass the first class name.
from scrapy.selector import Selector
output