skip to Main Content

I have been using the following code to scrape a table from a website and put it into an Excel file for a couple of years. All of a sudden, it has stopped working and I can’t figure out why. Here’s an edited version of the code.

import requests
import pandas
#from pandas import DataFrame
import pandas as pd
#import json
#from pandas.io.json import json_normalize
#from bs4 import BeautifulSoup as soup

#These are the headers I pass
headers = {
    'accept': 'application/json, text/plain, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.6',
    'cookie': '[get the authentication cookie string from website and paste it here]',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'sec-gpc': '1',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,    like Gecko) Chrome/104.0.5112.102 Safari/537.36'
}
overview_2023 = requests.get("https://[site].com/api/v1/teams/overview?    league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",   headers=headers).json()   
overviewkeys = overview_2023.keys()
#print(overviewkeys)
#overview_2023.get('restricted')
#print(overview_2023['restricted'])
#overview_2023['team_overview'] points to a list - the one within the dict it belongs to
#print(overview_2021['team_overview'])
teamdata = overview_2023['team_overview']

Site2023teamgrades = requests.get('https://[site].com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20', headers=headers).json()

SiteGrades = {}   
for team in Site2023teamgrades['team_overview']:
    SiteGrades[team['name']] = {'name':team['name'],'franchise_id':team['franchise_id'],'abbreviation':team['abbreviation'], 'wins':team['wins'] if 'wins' in team else None, 'losses':team['losses'] if 'losses'in team else None, 'ties':team['ties'] if 'ties' in team else None, 'points_allowed':team['points_allowed'] if 'points_allowed' in team else None, 'points_scored':team['points_scored'] if 'points_scored' in team else None, 'grades_coverage_defense':team['grades_coverage_defense'] if 'grades_coverage_defense' in team else None, 'grades_defense':team['grades_defense'] if 'grades_defense' in team else None,'grades_misc_st':team['grades_misc_st'] if 'grades_misc_st' in team else None, 'grades_offense':team['grades_offense'] if 'grades_offense' in team else None, 'grades_overall':team['grades_overall'] if 'grades_overall' in team else None, 'grades_pass':team['grades_pass'] if 'grades_pass' in team else None, 'grades_pass_block':team['grades_pass_block'] if 'grades_pass_block' in team else None, 'grades_pass_route':team['grades_pass_route'] if 'grades_pass_route' in team else None, 'grades_pass_rush_defense':team['grades_pass_rush_defense'] if 'grades_pass_rush_defense' in team else None, 'grades_run':team['grades_run'] if 'grades_run' in team else None, 'grades_run_block':team['grades_run_block'] if 'grades_run_block' in team else None, 'grades_run_defense':team['grades_run_defense'] if 'grades_run_defense' in team else None, 'grades_tackle':team['grades_tackle'] if 'grades_tackle' in team else None}

gradestable = pd.DataFrame.from_dict(SiteGrades)
gradestable = gradestable.T

table.to_excel(r'C:[path]2023SiteExports.xlsx', sheet_name = '2023grades', index = False)

Out of nowhere, I get the JSONDecodeError: Expecting value.

I was expecting the result to be an Excel file with the requested data.

I have updated the authentication cookie, so that is not the issue.

When I test the code with:

if response.status_code == 200:
    try:
        data = response.json()
    except ValueError:
        print("Response not in expected JSON format.")
        print("Response content:", response.text)

else:
    print("Request failed with status code:", response.status_code)
    print("Response content:", response.text)`

I get Response not in expected JSON format, and a lot of gibberish in the print.

But when I inspect the site from which this is sourced, the "Response" tab of Fetch/XHR shows clear JSON formatted data.

2

Answers


  1. The URI has changed. Here is the new one:

    https://premium.pff.com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20

    Login or Signup to reply.
  2. Remove headers= you’re using (or at least the accept-encoding key). Try:

    import requests
    
    url = "https://premium.pff.com/api/v1/teams/overview?league=ncaa&season=2023&week=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
    
    data = requests.get(url).json()
    print(data)
    

    Prints:

    {'restricted': ['grades_coverage_defense', 'grades_defense', 'grades_misc_st',
    
    ...
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search