skip to Main Content

I’m trying to collect some data from a game box score like this: https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/

The data is stored in a file (‘data.json’) which I managed to download from network page on chrome . I’ve been able to then parse it and get the data I need.
Now I’m trying to pull the directly from the url (without downloading the file) to automate my data gathering from multiple pages of the same kind.
I’m no expert in requests from sites, especially if they are not static and the information is actively taken with a / so forgive any bad phrasing of the concepts.

This is what I’ve tried so far:

url = "https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/"

response = urlopen(url)
data = json.loads(response.read())

#json parsing and data gathering from data

which gives the error:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I then tried adding the ‘data.json’ at the end of the url:

url = "https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/data.json"

response = urlopen(url)
data = json.loads(response.read())

#json parsing and data gathering from data

which produces:

urllib.error.HTTPError: HTTP Error 403: Forbidden

From what I understand in the first case the request just comes up empty, while on the second case it is not able to open the file.
I understood that if I don’t have manually opened the chrome page the https://…/data.json page returns the error 403, however it correctly loads the data.json after I reload the page with ctr+R on the network page.
What I understand is that I need to perform some other action beyond the requests.get() or anything similar from urllib , in order to pull down the json file.
Could someone point me in the right direction?

2

Answers


  1. Using the correct URL in your Python script correctly loads the JSON. The confusion is that you get a 403 code rather than a 404.

    The 403 code is due to the permissions on the s3 bucket, as described in this blog post and in more detail in the AWS docs

    If you don’t have the s3:ListBucket permission, Amazon S3 will return an HTTP status code 403 (“access denied”) error.

    If you look at the headers for the failed request, it reports that it is served by S3.

    If you look at the chrome developer tools when loading the HTML page, the URL for the data actually is:
    https://fibalivestats.dcd.shared.geniussports.com/data/2213178/data.json

    Login or Signup to reply.
  2. You can use . For ex. I scraped names of player You can develop and add to code what do yo want.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    
    
    driver = webdriver.Chrome(r'C:UsersKriegDownloadschromedriver_win32chromedriver.exe')
    driver.get(url)
    
    x = driver.find_elements(By.CSS_SELECTOR, 'td.player-name.team-0-summary-leaders')
    obj = {}
    for player in x:
        print(z.text)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search