skip to Main Content

Page

On the page above if you click ‘Download CSV’ it will download a CSV file to your computer. I would like to set up a nightly process to download that CSV. I’m happy to scrape the data as well, a CSV just seems easier. I’m not really finding anything. Help?

3

Answers


  1. import requests
    
    def get_daily_stats(url):
        response = requests.get(url, headers={
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
            'Referer': 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc'
        })
        with open('daily_stats.csv', 'wb') as f:
            f.write(response.content)
        return
    
    def main():
        url = 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc&csv=true'
        get_daily_stats(url)
    
    if __name__ == '__main__':
        main()
    

    This will download the CSV for you and save it to daily_stats.csv in the folder that the script exists in. You’ll have to install requests too – python -m pip install requests. How to do it nightly would be more a matter of what works best for you. I mean, you could just run it every night, or is your goal to have a process on your computer that would auto-run it?

    I suppose this will stop working in 2025, but you could just change the year in the URL at that point.

    Login or Signup to reply.
  2. import requests
    from bs4 import BeautifulSoup
    import os
    
    # URL of the webpage
    url = "https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc"
    
    # Send a GET request to the webpage
    response = requests.get(url)
    
    # Check if the request was successful
    if response.status_code == 200:
        # Parse the HTML content
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find the link to the CSV file
        csv_link = soup.find('a', text='Download CSV')['href']
        
        # Download the CSV file
        csv_response = requests.get(csv_link)
        
        # Check if the request was successful
        if csv_response.status_code == 200:
            # Specify the directory to save the CSV file
            save_dir = "/path/to/save/directory"
            
            # Create the directory if it doesn't exist
            if not os.path.exists(save_dir):
                os.makedirs(save_dir)
            
            # Save the CSV file
            with open(os.path.join(save_dir, 'data.csv'), 'wb') as f:
                f.write(csv_response.content)
            
            print("CSV file downloaded successfully.")
        else:
            print("Failed to download CSV file.")
    else:
        print("Failed to retrieve webpage.")
    
    Login or Signup to reply.
  3. import requests
    import datetime
    
    def download_csv(url, filename):
      response = requests.get(url)
      if response.status_code == 200:
        with open(filename, 'wb') as f:
            f.write(response.content)
        print(f"CSV file downloaded successfully as {filename}")
    else:
        print("Failed to download CSV file")
    
    if __name__ == "__main__":
      # URL of the webpage where the CSV file is located
      csv_url = "https://example.com/download/csv"
    
      # Filename to save the CSV file as
      timestamp = datetime.datetime.now().strftime("%Y-%m-%d")
      csv_filename = f"data_{timestamp}.csv"  # You can customize the 
      filename as needed
    
      # Download the CSV file
      download_csv(csv_url, csv_filename)
    

    This defines a function (download_csv) that takes a URL and a filename as input. It uses the requests library to fetch the content of the webpage and saves it to the specified ‘filename’ on your computer.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search