skip to Main Content

I am trying to scrape the available rooms from this booking website. My Attempt:

import requests
from bs4 import BeautifulSoup

#FORMAT: TT_tt_MMM_jjjj
#At max 2 weeks in advance
date = "Sa 23 Sep 2023"
min_space = 3
only_big_rooms = False
hut_id = 92


URL = "https://www.alpsonline.org/reservation/calendar?hut_id=" + str(hut_id)


response = requests.get(URL)
#print(response.text)
soup = BeautifulSoup(response.text, 'html.parser')

for i in range(0,13):
    bookingID = "bookingDate" + str(i)
    block = soup.find('div', {'id':bookingID})
    if (block.text == date):
        print("found it once!")
    print(block.text)
    print(block)

#also failed: 
for i in range(0,13):
    bookingID = "bookingDateHidden" + str(i)
    block = soup.find('input', {'id':bookingID})
    if (block.text == date):
        print("found it once!")
    print(block.value)
    print(block)

When running the code, all the labels and texts the code gets from the site are empty. So my console output looks like this:

moritz@cupid:~$ python3 Alpines_scraper.py 
<div class="main-label" id="bookingDate0"></div>
<div class="main-label" id="bookingDate1"></div>
<div class="main-label" id="bookingDate2"></div>
<div class="main-label" id="bookingDate3"></div>
<div class="main-label" id="bookingDate4"></div>
<div class="main-label" id="bookingDate5"></div>
<div class="main-label" id="bookingDate6"></div>
<div class="main-label" id="bookingDate7"></div>
<div class="main-label" id="bookingDate8"></div>
<div class="main-label" id="bookingDate9"></div>
<div class="main-label" id="bookingDate10"></div>
<div class="main-label" id="bookingDate11"></div>
<div class="main-label" id="bookingDate12"></div>
None
<input id="bookingDateHidden0" type="hidden"/>
None
<input id="bookingDateHidden1" type="hidden"/>
None
<input id="bookingDateHidden2" type="hidden"/>
...

After a bit of research, I found out, that my site probably didn’t load the content at the stage of the request. After a bit of research, I found this SO post, but couldn’t really wrap my head around it. Since this is my first try at web scraping.

If someone is willing to help me, or knows of a tutorial, that deals with this kind of problem, I’d be happy to hear from you.

2

Answers


  1. As hinted at in the comments, the data is loaded dynamically, so it is not in the html code. Upon inspecting the developer tools the API endpoint can be found, which can then be queried directly to collect the data:

    import requests
    
    hut_id = 92
    
    headers = {
        'accept': 'application/json, text/javascript, */*; q=0.01',
        'referer': f'https://www.alpsonline.org/reservation/calendar?hut_id={hut_id}'
    }
    
    params = {
        'date': '12.09.2023',
    }
    
    with requests.Session() as session:
        # First visit the referer page to potentially receive initial cookies
        session.get(f'https://www.alpsonline.org/reservation/calendar?hut_id={hut_id}', headers=headers)
    
        # Now make the actual request with the session, which now contains the cookies
        response = session.get('https://www.alpsonline.org/reservation/selectDate', params=params, headers=headers)
    
    data = response.json()
    
    Login or Signup to reply.
  2. Here is one way of getting the booking for a particular day/hut, by observing the network calls made by page (see Dev tools — Network tab — XHR calls in browser):

    import requests
    from bs4 import BeautifulSoup as bs
    import pandas as pd
    
    
    
    headers = {
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
    }
    
    s = requests.Session()
    s.headers.update(headers)
    s.get('https://www.alpsonline.org/reservation/calendar?hut_id=92')
    r = s.get('https://www.alpsonline.org/reservation/selectDate?date=15.09.2023')
    json_list = []
    for i, value in r.json().items():
        json_list.append(value[0])
    df = pd.DataFrame(json_list)
    print(df)
    

    Result in terminal:

        hutBedCategoryId    bedCategoryId   bedCategoryType bookingEnabled  closed  events  freeRoom    futureHutOccupancyShown reservationDate totalRoom   contingentId    contingentText  contingentTitle unserviced  unservicedWaitingList   hutDefaultLanguage  reservedRoomsRatio  eventsAsString
    0   542 4   ROOM    True    False   []  3   False   15.09.2023  31  20363   None    None    False   False   de_AT   0.903226    
    1   542 4   ROOM    True    False   []  0   False   16.09.2023  31  20363   None    None    False   False   de_AT   1.000000    
    2   542 4   ROOM    True    False   []  29  False   17.09.2023  31  20363   None    None    False   False   de_AT   0.064516    
    3   542 4   ROOM    True    False   []  23  False   18.09.2023  31  20363   None    None    False   False   de_AT   0.258065    
    4   542 4   ROOM    True    False   []  17  False   19.09.2023  31  20363   None    None    False   False   de_AT   0.451613    
    5   542 4   ROOM    True    False   []  0   False   20.09.2023  31  20363   None    None    False   False   de_AT   1.000000    
    6   542 4   ROOM    True    False   []  22  False   21.09.2023  31  20363   None    None    False   False   de_AT   0.290323    
    7   542 4   ROOM    True    False   []  0   False   22.09.2023  31  20363   None    None    False   False   de_AT   1.000000    
    8   542 4   ROOM    True    False   []  0   False   23.09.2023  31  20363   None    None    False   False   de_AT   1.000000    
    9   542 4   ROOM    True    False   []  27  False   24.09.2023  31  20363   None    None    False   False   de_AT   0.129032    
    10  542 4   ROOM    True    False   []  29  False   25.09.2023  31  20363   None    None    False   False   de_AT   0.064516    
    11  542 4   ROOM    True    False   []  10  False   26.09.2023  31  20363   None    None    False   False   de_AT   0.677419    
    12  542 4   ROOM    True    False   []  31  False   27.09.2023  31  20363   None    None    False   False   de_AT   0.000000    
    13  542 4   ROOM    True    False   []  28  False   28.09.2023  31  20363   None    None    False   False   de_AT   0.096774    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search