I have a webpage (https://deos.udel.edu/data/daily_retrieval.php) I want to extract data from. However, the data is precipitation data related to a specific selection made within the webpage. The specific selections include Station, and date. I am using the R package rvest
and I am not sure if this data request can be done in R with rvest
. Some of the source code of interest for the webpage may be found below.
<label class="retsection" for="station">Station:</label><br>
<select class="statlist" name="station" id="station" size="10">
<option class="select_input" value="DTBR" selected>Adamsville, DE-Taber</option>
<option class="select_input" value="DBUR">Angola, DE-Burton Pond</option>
<option class="select_input" value="DWHW">Atglen, PA-Wolfs Hollow</option>
<option class="select_input" value="DBBB">Bethany Beach, DE-Boardwalk</option>
<option class="select_input" value="DBNG">Bethany Beach, DE-NGTS</option>
<option class="select_input" value="DBKB">Blackbird, DE-NERR</option>
<option class="select_input" value="DBRG">Bridgeville, DE</option>
<label class="retsection">Date:<br> </label>
<select name='month' size='6' length='10'>
<option value='1'>January</option>
<option value='2'>February</option>
<option value='3'>March</option>
<option value='4'>April</option>
<option value='5'>May</option>
<option value='6'>June</option>
<option value='7'>July</option>
<option value='8' selected>August</option>
<option value='9'>September</option>
<option value='10'>October</option>
<option value='11'>November</option>
<option value='12'>December</option>
</select>
<select name='day' size='6' length='4'>
<option value='1'>1</option>
<option value='2' selected>2</option>
<option value='3'>3</option>
<option value='4'>4</option>
<option value='5'>5</option>
<option value='6'>6</option>
My initial thought is this task cannot be done since the precipitation data is not actively displayed on the webpage… the data pops up in a separate window after the selection is made. I have an access key provided by the webpage but am not 100% sure if it can be used to retrieve the large dataset I am wishing to pull.
- Is this type of data request feasible with
rvest
? - What would be some suggested methods to extract large amounts of data via R. For example a years worth of precipitation data for a specific station of interest.
Thanks.
2
Answers
This package is designed for static HTML scraping. but the precipitation data appears in a separate window or is loaded dynamically based on user selections.
So you have to use a different approach.
To retrieve a range of precipitation data without calling each day individually, you can loop through the desired date range and make requests programmatically
Disclaimer:
You probably don’t need to use this. DEOS have ways of downloading historical data as CSVs. Beyond that, make sure if you’re scraping the site, you leave some time between each request, otherwise you’ll be annoying the owners, and they’re likely to block you, or slow your responses down.
Answer:
The trick with this is that the parameters are included in the URL. So we only need to adjust those, in order to get a new result, as below:
Output:
To extend this to multiple days, you could use a for loop, or
map()
, or any number of other functions which do roughly the same thing. But without knowing for sure the information you are wanting from that site, I would say it’s highly likely you can get it from them in other ways.