Html - How to automatically scrape the following CSV

Sparkles
April 24, 2024
151 views
3 votes
3 Answers

On the page above if you click ‘Download CSV’ it will download a CSV file to your computer. I would like to set up a nightly process to download that CSV. I’m happy to scrape the data as well, a CSV just seems easier. I’m not really finding anything. Help?

Answers

import requests

def get_daily_stats(url):
    response = requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
        'Referer': 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc'
    })
    with open('daily_stats.csv', 'wb') as f:
        f.write(response.content)
    return

def main():
    url = 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc&csv=true'
    get_daily_stats(url)

if __name__ == '__main__':
    main()

This will download the CSV for you and save it to daily_stats.csv in the folder that the script exists in. You’ll have to install requests too – python -m pip install requests. How to do it nightly would be more a matter of what works best for you. I mean, you could just run it every night, or is your goal to have a process on your computer that would auto-run it?

I suppose this will stop working in 2025, but you could just change the year in the URL at that point.

import requests
from bs4 import BeautifulSoup
import os

# URL of the webpage
url = "https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc"

# Send a GET request to the webpage
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find the link to the CSV file
    csv_link = soup.find('a', text='Download CSV')['href']
    
    # Download the CSV file
    csv_response = requests.get(csv_link)
    
    # Check if the request was successful
    if csv_response.status_code == 200:
        # Specify the directory to save the CSV file
        save_dir = "/path/to/save/directory"
        
        # Create the directory if it doesn't exist
        if not os.path.exists(save_dir):
            os.makedirs(save_dir)
        
        # Save the CSV file
        with open(os.path.join(save_dir, 'data.csv'), 'wb') as f:
            f.write(csv_response.content)
        
        print("CSV file downloaded successfully.")
    else:
        print("Failed to download CSV file.")
else:
    print("Failed to retrieve webpage.")

import requests
import datetime

def download_csv(url, filename):
  response = requests.get(url)
  if response.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(response.content)
    print(f"CSV file downloaded successfully as {filename}")
else:
    print("Failed to download CSV file")

if __name__ == "__main__":
  # URL of the webpage where the CSV file is located
  csv_url = "https://example.com/download/csv"

  # Filename to save the CSV file as
  timestamp = datetime.datetime.now().strftime("%Y-%m-%d")
  csv_filename = f"data_{timestamp}.csv"  # You can customize the 
  filename as needed

  # Download the CSV file
  download_csv(csv_url, csv_filename)

This defines a function (download_csv) that takes a URL and a filename as input. It uses the requests library to fetch the content of the webpage and saves it to the specified ‘filename’ on your computer.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – How to automatically scrape the following CSV

Answers