Html - How to scrape specific element with a certain id in BeautifulSoup?

ZincCheng
May 14, 2023
294 views
0 votes
2 Answers

I am trying to scrape the table from baseball reference: https://www.baseball-reference.com/players/b/bondsba01.shtml, and the table I want is the one with id="batting_value", but when I trying to print out what I have scraped, the program returned an empty list instead. Any information or assistance is appreciated, thanks!

from bs4 import BeautifulSoup
from urllib.request import urlopen

root_page = "https://www.baseball-reference.com/players/b/bondsba01.shtml"
soup = BeautifulSoup(urlopen(root_page), features = 'lxml')

table = soup.find('table', id = 'batting_value')
print(table)

I’ve tried to print the <div> with id="div_batting_value" which contains the table in it, but still doesn’t work. However, I can successfully print out other <div> elements with different id.

Answers

Main issue here is that the table is hidden in the comments, so you have to bring it up first, before BeautifulSoup could find it – simplest solution in my opinion is to replace the specific characters in this case:

.replace('<!--','').replace('-->','')

Alternative is to be more specific and use bs4.Comment

Example

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
        requests.get('https://www.baseball-reference.com/players/b/bondsba01.shtml').text.replace('<!--','').replace('-->','')
)
soup.select_one('#batting_value')

Or in use with pandas.read_html():

import requests
import pandas as pd

df = pd.read_html(requests.get('https://www.baseball-reference.com/players/b/bondsba01.shtml').text.replace('<!--','').replace('-->',''), attrs={'id':'batting_value'})[0]
df[(~df.Lg.isna()) & (df.Lg != 'Lg')]

Results in:

	Year	Age	Tm	Lg	G	PA	Rbat	Rbaser	Rdp	Rfield	Rpos	RAA	WAA	Rrep	RAR	WAR	waaWL%	162WL%	oWAR	dWAR	oRAR	Salary	Pos	Awards
0	1986	21	PIT	NL	113	484	3	5	0	8	1	17	1.9	16	34	3.5	0.517	0.512	2.6	1	25	$60,000	*8/H	RoY-6
1	1987	22	PIT	NL	150	611	11	3	1	24	-3	36	3.7	21	57	5.8	0.525	0.523	3.2	2.1	33	$100,000	*78H/9	nan
…
20	2006	41	SFG	NL	130	493	30	1	0	1	-4	27	2.5	15	42	4	0.52	0.516	3.9	-0.4	41	$19,331,470	*7H/D	nan
21	2007	42	SFG	NL	126	477	37	-1	-1	-10	-4	21	2	15	36	3.4	0.516	0.513	4.4	-1.5	46	$15,533,970	*7H/D	AS

- DmitriiMalygin
- May 11, 2023 at 5:15 pm
- 0 votes
0
There is only one table on the page:
```
print(len(soup.find_all('table')))
```
output: 1

You can use simple find to get the table:
```
table = soup.find_all('table'))
```
And work with it. For example, there are rows:
```
table.find('tbody').find_all('th')
```
Does this solve your task?
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – How to scrape specific element with a certain id in BeautifulSoup?

Answers

Example