I am trying to scrape the available rooms from this booking website. My Attempt:
import requests
from bs4 import BeautifulSoup
#FORMAT: TT_tt_MMM_jjjj
#At max 2 weeks in advance
date = "Sa 23 Sep 2023"
min_space = 3
only_big_rooms = False
hut_id = 92
URL = "https://www.alpsonline.org/reservation/calendar?hut_id=" + str(hut_id)
response = requests.get(URL)
#print(response.text)
soup = BeautifulSoup(response.text, 'html.parser')
for i in range(0,13):
bookingID = "bookingDate" + str(i)
block = soup.find('div', {'id':bookingID})
if (block.text == date):
print("found it once!")
print(block.text)
print(block)
#also failed:
for i in range(0,13):
bookingID = "bookingDateHidden" + str(i)
block = soup.find('input', {'id':bookingID})
if (block.text == date):
print("found it once!")
print(block.value)
print(block)
When running the code, all the labels and texts the code gets from the site are empty. So my console output looks like this:
moritz@cupid:~$ python3 Alpines_scraper.py
<div class="main-label" id="bookingDate0"></div>
<div class="main-label" id="bookingDate1"></div>
<div class="main-label" id="bookingDate2"></div>
<div class="main-label" id="bookingDate3"></div>
<div class="main-label" id="bookingDate4"></div>
<div class="main-label" id="bookingDate5"></div>
<div class="main-label" id="bookingDate6"></div>
<div class="main-label" id="bookingDate7"></div>
<div class="main-label" id="bookingDate8"></div>
<div class="main-label" id="bookingDate9"></div>
<div class="main-label" id="bookingDate10"></div>
<div class="main-label" id="bookingDate11"></div>
<div class="main-label" id="bookingDate12"></div>
None
<input id="bookingDateHidden0" type="hidden"/>
None
<input id="bookingDateHidden1" type="hidden"/>
None
<input id="bookingDateHidden2" type="hidden"/>
...
After a bit of research, I found out, that my site probably didn’t load the content at the stage of the request. After a bit of research, I found this SO post, but couldn’t really wrap my head around it. Since this is my first try at web scraping.
If someone is willing to help me, or knows of a tutorial, that deals with this kind of problem, I’d be happy to hear from you.
2
Answers
As hinted at in the comments, the data is loaded dynamically, so it is not in the html code. Upon inspecting the developer tools the API endpoint can be found, which can then be queried directly to collect the data:
Here is one way of getting the booking for a particular day/hut, by observing the network calls made by page (see Dev tools — Network tab — XHR calls in browser):
Result in terminal: