I am trying to scrape the table
from baseball reference: https://www.baseball-reference.com/players/b/bondsba01.shtml, and the table I want is the one with id="batting_value"
, but when I trying to print
out what I have scraped, the program returned an empty list
instead. Any information or assistance is appreciated, thanks!
from bs4 import BeautifulSoup
from urllib.request import urlopen
root_page = "https://www.baseball-reference.com/players/b/bondsba01.shtml"
soup = BeautifulSoup(urlopen(root_page), features = 'lxml')
table = soup.find('table', id = 'batting_value')
print(table)
I’ve tried to print
the <div>
with id="div_batting_value"
which contains the table
in it, but still doesn’t work. However, I can successfully print out other <div>
elements with different id
.
2
Answers
Main issue here is that the
table
is hidden in the comments, so you have to bring it up first, beforeBeautifulSoup
could find it – simplest solution in my opinion is to replace the specific characters in this case:Alternative is to be more specific and use
bs4.Comment
Example
Or in use with
pandas.read_html()
:Results in:
There is only one table on the page:
output: 1
You can use simple find to get the table:
And work with it. For example, there are rows:
Does this solve your task?