I am doing a web scraping project, my main goal is to web scrape from the website basketball-reference.com. The goal I have is to extract statistics from the best player from the Miami Heat and 2 other Miami based teams to make something along the lines of digital flash cards to compare performance. Here is my current code in progress (The current output is the else statement, "Jimmy Butler’s statistics not found.")
Before I show the code, I am still learning the ins and out of web scraping. What is the exact process to properly extract the desired information from the HTML. I appreciate any and all help!
Current code as of 7/5/2023:
import requests
from bs4 import BeautifulSoup
# URL of the webpage with Jimmy Butler's statistics
url = "https://www.basketball-reference.com/players/b/butleji01.html"
# Send a GET request to the webpage
response = requests.get(url)
html_content = response.content
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Find the HTML table that contains the statistics
table = soup.find("table", {"id": "per_game"})
# Extract the table headers (column names)
headers = table.find("thead").find_all("th")
column_names = [header.text for header in headers]
# Find the row that corresponds to Jimmy Butler
rows = table.find("tbody").find_all("tr")
jimmy_butler_row = None
for row in rows:
if row.find("th").text == "Jimmy Butler":
jimmy_butler_row = row
break
# Check if the row for Jimmy Butler was found
if jimmy_butler_row is not None:
# Extract the statistics for Jimmy Butler
stats = jimmy_butler_row.find_all("td")
pts = stats[22.9].text
ast = stats[5.3].text
fg_percentage = stats[53.9].text
ft_percentage = stats[85.0].text
# Store the extracted statistics in a data structure
jimmy_butler_stats = {
"Points (PTS)": pts,
"Assists (AST)": ast,
"Field Goal Percentage (FG%)": fg_percentage,
"Free Throw Percentage (FT%)": ft_percentage
}
# Print the extracted statistics
print("Jimmy Butler's Statistics:")
for stat_name, stat_value in jimmy_butler_stats.items():
print(stat_name + ":", stat_value)
else:
print("Jimmy Butler's statistics not found.")
2
Answers
You have no indent after your if statement and also no indent after your else statement. Thereby nothing is executed based on their evaluation. The indentation is needed for the affiliation of codeblocks to if and for statements since python does not use brackets for that like other programming languages do.
I guess you wanted to something like that:
EDIT:
As it seems I missed another indentation error
This still does not find Jimmy Butler’s statistics since there is no table header (th) with text Jimmy Butler’s, I only get some dates if I let print out all the th:
When looking at that page I also can not see a table header with text Jimmy Butler, are you maybe using the wrong page/url?
There is no
<th>
tags with the text'Jimmy Butler'
, hence that if statement will return False, and go to your else statement. Based on your description, I am going to assume you are trying to pull stats from the team site'https://www.basketball-reference.com/teams/MIA/2023.html
‘There are a few other things you need to fix and I’ll comment it in the code:
Output:
Lastly, tables are a great way to learn beautifulsoup and html since they are well structured. However, once you get a hang of it, consider using Pandas to pull table tags:
Then could just filter the df.
Output: