i have the following table
<table class="suppress_all sortable stats_table now_sortable sticky_table eq1 re1 le1" id="divs_standings_W" data-cols-to-freeze=",1">
<caption> Table</caption>
<colgroup><col><col><col><col><col><col><col><col></colgroup>
<thead>
<tr>
<th aria-label="Western Conference" data-stat="team_name" scope="col" class=" poptip sort_default_asc left">Western Conference</th>
<th aria-label="Wins" data-stat="wins" scope="col" class=" poptip right" data-tip="Wins">W</th>
<th aria-label="Losses" data-stat="losses" scope="col" class=" poptip right" data-tip="Losses">L</th>
<th aria-label="Win-Loss Percentage" data-stat="win_loss_pct" scope="col" class=" poptip right" data-tip="Win-Loss Percentage">W/L%</th>
<th aria-label="GB" data-stat="gb" scope="col" class=" poptip sort_default_asc right" data-tip="Games Behind">GB</th>
<th aria-label="Points Per Game" data-stat="pts_per_g" scope="col" class=" poptip right" data-tip="Points Per Game">PS/G</th>
<th aria-label="Opponent Points Per Game" data-stat="opp_pts_per_g" scope="col" class=" poptip right" data-tip="Opponent Points Per Game">PA/G</th>
<th aria-label="Simple Rating System" data-stat="srs" scope="col" class=" poptip right" data-tip="Simple Rating System; a team rating that takes into account average point differential and strength of schedule. The rating is denominated in points above/below average, where zero is average.">SRS</th>
</tr>
</thead>
<tbody><tr class="thead" data-row="0"><th colspan="8"><strong>Northwest Division</strong></th></tr>
<tr class="full_table" data-row="1"><th scope="row" class="left " data-stat="team_name"><a href="/teams/UTA/2021.html">Utah Jazz</a>*</th><td class="right " data-stat="wins">52</td><td class="right " data-stat="losses">20</td><td class="right " data-stat="win_loss_pct">.722</td><td class="right " data-stat="gb">—</td><td class="right " data-stat="pts_per_g">116.4</td><td class="right " data-stat="opp_pts_per_g">107.2</td>
i am trying to scrape the table and remove use .decompose(), but it didn’t removed
this is my work:
dfs_player = []
for year in years:
with open("player/{}.html".format(year)) as f:
page = f.read()
soup = BeautifulSoup(page, "html.parser")
soup.find("tr", class_="thead").decompose()
player_table = soup.find(id="div_per_game_stats")
player = pd.read_html(str(player_table))[0]
player['Year'] = year
dfs_player.append(player)
and here is the example url = text
can you guys help me, please?
2
Answers
I’m not sure about thead is a class, i think it’s more a tag, you can test this code, if it works:
Or this code:
As an alternative, why don’t you just ask
Pandas
to read the actual page HTML source and find the<table>
by itself? This can be done with theread_html
method:Print: