Html - NFL Web Scraper HELP: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load

RaulOjeda
October 30, 2024
182 views
0 votes
2 Answers

I am new to coding and need some assistance. I am trying to make a web scraper for a project that involves scraping NFL roster data from 2000 to 2023 but am getting an error requesting the html. I am using Jupyter labs (Python-Pyodide) to write my code and this is the only code I have:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from io import StringIO

years = list(range(2000, 2024))
url = 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'
data = requests.get(url)

This is the error I’m getting:

(JsException: NetworkError: Failed to execute ‘send’ on ‘XMLHttpRequest’: Failed to load ‘https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023’.)

Can you explain why I am getting this error and how do i fix it?

Answers

You didn’t specify the request headers. But this page doesnt have table tags, so u cant use pd.read_html

import requests
from bs4 import BeautifulSoup
import pandas as pd


url = "https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023"
headers = {
  'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
  'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
}
result = []
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find('div', class_='divtable divtable-striped divtable-mobile')
table_head = [head.get_text() for head in table.find('div', class_='thead')]
for s in table.find_all('span', class_='visible-xs-inline'):
    s.extract()
for row in table.find_all('div', class_='tr'):
    result.append(dict(zip(table_head, [cell.get_text() for cell in row.find_all('div', class_='td')])))
df = pd.DataFrame(result)
print(df)

OUTPUT:

     #            Player Pos   G  GS Age            College
0   82   Andre Baccellia  WR   5   0  26         Washington
1    3       Budda Baker  DB  12  12  27         Washington
2   96        Eric Banks  DE   2   0  25  Texas-San Antonio
3   51       Krys Barnes  LB  16   6  25               UCLA
4   66    Jackson Barton  OT   1   0  28               Utah
..  ..               ...  ..  ..  ..  ..                ...
73  21  Garrett Williams  DB   9   6  22           Syracuse
74  27     Divaad Wilson  DB   2   1  23    Central Florida
75  20      Marco Wilson  DB  15  11  24            Florida
76  14    Michael Wilson  WR  13  12  23           Stanford
77  10        Josh Woods  LB  11   7  27           Maryland

- nabiladnan1610
- October 30, 2024 at 8:08 am
- 0 votes
0
You need to send headers with your get request. Specifically User-Agent. When you send this value it mocks as if the request comes from a browser e.g. a real person and not a bot/scraper. You can find this value easily by Googling "what is my user agent". Copy that entire thing; you will need it in a minute.

Declare a dict using the value you copied:
```
my_headers = {
    "User-Agent": "<YOUR_VALUE>"
}
```
Pass headers as an argument in the get method:
```
my_url = "https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023"
data = requests.get(url=my_url, headers=my_headers)
print(data.content) # just to confirm you got the response back
```
Here is the scenic route to get your User-Agent and see what values are/could be there in "headers", if you’re interested:
1. Hit F12 on your keyboard when viewing this page. The developer tools will open up.
2. Navigate to the "Network" tab
3. Choose "All"
4. If you don’t see anything, no worries; just refresh the page
5. Click on an item, you will see another section pop up
6. Click on "Headers" and scroll down until you find "User-Agent"
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – NFL Web Scraper HELP: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load

Answers