I am new to coding and need some assistance. I am trying to make a web scraper for a project that involves scraping NFL roster data from 2000 to 2023 but am getting an error requesting the html. I am using Jupyter labs (Python-Pyodide) to write my code and this is the only code I have:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from io import StringIO
years = list(range(2000, 2024))
url = 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'
data = requests.get(url)
This is the error I’m getting:
(JsException: NetworkError: Failed to execute ‘send’ on ‘XMLHttpRequest’: Failed to load ‘https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023’.)
Can you explain why I am getting this error and how do i fix it?
2
Answers
You didn’t specify the request headers. But this page doesnt have table tags, so u cant use
pd.read_html
OUTPUT:
You need to send
headers
with yourget
request. SpecificallyUser-Agent
. When you send this value it mocks as if the request comes from a browser e.g. a real person and not a bot/scraper. You can find this value easily by Googling "what is my user agent". Copy that entire thing; you will need it in a minute.Declare a
dict
using the value you copied:Pass
headers
as an argument in theget
method:Here is the scenic route to get your User-Agent and see what values are/could be there in "headers", if you’re interested: