I have been using the following code to scrape a table from a website and put it into an Excel file for a couple of years. All of a sudden, it has stopped working and I can’t figure out why. Here’s an edited version of the code.
import requests
import pandas
#from pandas import DataFrame
import pandas as pd
#import json
#from pandas.io.json import json_normalize
#from bs4 import BeautifulSoup as soup
#These are the headers I pass
headers = {
'accept': 'application/json, text/plain, */*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.6',
'cookie': '[get the authentication cookie string from website and paste it here]',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'sec-gpc': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36'
}
overview_2023 = requests.get("https://[site].com/api/v1/teams/overview? league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20", headers=headers).json()
overviewkeys = overview_2023.keys()
#print(overviewkeys)
#overview_2023.get('restricted')
#print(overview_2023['restricted'])
#overview_2023['team_overview'] points to a list - the one within the dict it belongs to
#print(overview_2021['team_overview'])
teamdata = overview_2023['team_overview']
Site2023teamgrades = requests.get('https://[site].com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20', headers=headers).json()
SiteGrades = {}
for team in Site2023teamgrades['team_overview']:
SiteGrades[team['name']] = {'name':team['name'],'franchise_id':team['franchise_id'],'abbreviation':team['abbreviation'], 'wins':team['wins'] if 'wins' in team else None, 'losses':team['losses'] if 'losses'in team else None, 'ties':team['ties'] if 'ties' in team else None, 'points_allowed':team['points_allowed'] if 'points_allowed' in team else None, 'points_scored':team['points_scored'] if 'points_scored' in team else None, 'grades_coverage_defense':team['grades_coverage_defense'] if 'grades_coverage_defense' in team else None, 'grades_defense':team['grades_defense'] if 'grades_defense' in team else None,'grades_misc_st':team['grades_misc_st'] if 'grades_misc_st' in team else None, 'grades_offense':team['grades_offense'] if 'grades_offense' in team else None, 'grades_overall':team['grades_overall'] if 'grades_overall' in team else None, 'grades_pass':team['grades_pass'] if 'grades_pass' in team else None, 'grades_pass_block':team['grades_pass_block'] if 'grades_pass_block' in team else None, 'grades_pass_route':team['grades_pass_route'] if 'grades_pass_route' in team else None, 'grades_pass_rush_defense':team['grades_pass_rush_defense'] if 'grades_pass_rush_defense' in team else None, 'grades_run':team['grades_run'] if 'grades_run' in team else None, 'grades_run_block':team['grades_run_block'] if 'grades_run_block' in team else None, 'grades_run_defense':team['grades_run_defense'] if 'grades_run_defense' in team else None, 'grades_tackle':team['grades_tackle'] if 'grades_tackle' in team else None}
gradestable = pd.DataFrame.from_dict(SiteGrades)
gradestable = gradestable.T
table.to_excel(r'C:[path]2023SiteExports.xlsx', sheet_name = '2023grades', index = False)
Out of nowhere, I get the JSONDecodeError: Expecting value.
I was expecting the result to be an Excel file with the requested data.
I have updated the authentication cookie, so that is not the issue.
When I test the code with:
if response.status_code == 200:
try:
data = response.json()
except ValueError:
print("Response not in expected JSON format.")
print("Response content:", response.text)
else:
print("Request failed with status code:", response.status_code)
print("Response content:", response.text)`
I get Response not in expected JSON format, and a lot of gibberish in the print.
But when I inspect the site from which this is sourced, the "Response" tab of Fetch/XHR shows clear JSON formatted data.
2
Answers
The URI has changed. Here is the new one:
https://premium.pff.com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
Remove
headers=
you’re using (or at least theaccept-encoding
key). Try:Prints: