skip to Main Content

i have the following table

<table class="suppress_all sortable stats_table now_sortable sticky_table eq1 re1 le1" id="divs_standings_W" data-cols-to-freeze=",1">
    <caption>&nbsp; Table</caption>
    

   <colgroup><col><col><col><col><col><col><col><col></colgroup>
   <thead>      
      <tr>
         <th aria-label="Western Conference" data-stat="team_name" scope="col" class=" poptip sort_default_asc left">Western Conference</th>
         <th aria-label="Wins" data-stat="wins" scope="col" class=" poptip right" data-tip="Wins">W</th>
         <th aria-label="Losses" data-stat="losses" scope="col" class=" poptip right" data-tip="Losses">L</th>
         <th aria-label="Win-Loss Percentage" data-stat="win_loss_pct" scope="col" class=" poptip right" data-tip="Win-Loss Percentage">W/L%</th>
         <th aria-label="GB" data-stat="gb" scope="col" class=" poptip sort_default_asc right" data-tip="Games Behind">GB</th>
         <th aria-label="Points Per Game" data-stat="pts_per_g" scope="col" class=" poptip right" data-tip="Points Per Game">PS/G</th>
         <th aria-label="Opponent Points Per Game" data-stat="opp_pts_per_g" scope="col" class=" poptip right" data-tip="Opponent Points Per Game">PA/G</th>
         <th aria-label="Simple Rating System" data-stat="srs" scope="col" class=" poptip right" data-tip="Simple Rating System; a team rating that takes into account average point differential and strength of schedule. The rating is denominated in points above/below average, where zero is average.">SRS</th>
</tr>
      </thead>
<tbody><tr class="thead" data-row="0"><th colspan="8"><strong>Northwest Division</strong></th></tr>
<tr class="full_table" data-row="1"><th scope="row" class="left " data-stat="team_name"><a href="/teams/UTA/2021.html">Utah Jazz</a>*</th><td class="right " data-stat="wins">52</td><td class="right " data-stat="losses">20</td><td class="right " data-stat="win_loss_pct">.722</td><td class="right " data-stat="gb">—</td><td class="right " data-stat="pts_per_g">116.4</td><td class="right " data-stat="opp_pts_per_g">107.2</td>

i am trying to scrape the table and remove use .decompose(), but it didn’t removed

this is my work:

dfs_player = []
for year in years:
    with open("player/{}.html".format(year)) as f:
        page = f.read()
    soup = BeautifulSoup(page, "html.parser")
    soup.find("tr", class_="thead").decompose()
    player_table = soup.find(id="div_per_game_stats")
    player = pd.read_html(str(player_table))[0]
    player['Year'] = year
    
    dfs_player.append(player)

and here is the example url = text

can you guys help me, please?

2

Answers


  1. I’m not sure about thead is a class, i think it’s more a tag, you can test this code, if it works:

    soup.find("thead").decompose()
    

    Or this code:

    for tr in soup.find("thead").find_all("tr"):
        tr.decompose()
    
    Login or Signup to reply.
  2. As an alternative, why don’t you just ask Pandas to read the actual page HTML source and find the <table> by itself? This can be done with the read_html method:

    import pandas as pd
    
    
    url = 'https://www.basketball-reference.com/leagues/NBA_2023_per_game.html'
    
    dfs = pd.read_html(url)
    print(dfs[0])
    

    Print:

          Rk            Player Pos Age   Tm   G  ...   AST  STL  BLK  TOV   PF   PTS
    0      1  Precious Achiuwa   C  23  TOR  55  ...   0.9  0.6  0.5  1.1  1.9   9.2
    1      2      Steven Adams   C  29  MEM  42  ...   2.3  0.9  1.1  1.9  2.3   8.6
    2      3       Bam Adebayo   C  25  MIA  75  ...   3.2  1.2  0.8  2.5  2.8  20.4
    3      4      Ochai Agbaji  SG  22  UTA  59  ...   1.1  0.3  0.3  0.7  1.7   7.9
    4      5      Santi Aldama  PF  22  MEM  77  ...   1.3  0.6  0.6  0.8  1.9   9.0
    ..   ...               ...  ..  ..  ...  ..  ...   ...  ...  ...  ...  ...   ...
    700  535    Thaddeus Young  PF  34  TOR  54  ...   1.4  1.0  0.1  0.8  1.6   4.4
    701  536        Trae Young  PG  24  ATL  73  ...  10.2  1.1  0.1  4.1  1.4  26.2
    702  537    Omer Yurtseven   C  24  MIA   9  ...   0.2  0.2  0.2  0.4  1.8   4.4
    703  538       Cody Zeller   C  30  MIA  15  ...   0.7  0.2  0.3  0.9  2.2   6.5
    704  539       Ivica Zubac   C  25  LAC  76  ...   1.0  0.4  1.3  1.5  2.9  10.8
    
    [705 rows x 30 columns]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search