skip to Main Content

I am having difficulty in displaying the product list. I want to scrape the data from a webpage. Since I am very very much new to Python and Webscraping. The print(productlist) is not working.

import requests
from bs4 import BeautifulSoup
import pandas as pd

baseurl = "https://www.thewhiskyexchange.com"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,  like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

k = requests.get('https://www.thewhiskyexchange.com/c/35/japanese-whisky')
soup=BeautifulSoup(k.text,'html.parser')
productlist = soup.find_all("li",{"class":"product-grid__item"})
print(productlist)

3

Answers


  1. There is nothing wrong with your usage of BeautifulSoup. The problem lies within the site: it is protected by CloudFlare, and attempts to scrape the site will be faced with a JavaScript challenge, a form of CAPTCHA.

    In this case, there is not much you can do to bypass CloudFlare.

    You can verify this by using curl: curl -L https://www.thewhiskyexchange.com. In the response, you can see this:

    <span id="challenge-error-text">Enable JavaScript and cookies to continue</span>
    

    which is a sign that your scraper is being blocked.

    And as @nejdetckenobi said, the website uses JavaScript to load the products, so the components would not load with requests. The following is an example using selenium instead:

    from selenium import webdriver as wd
    from selenium.webdriver.remote.webdriver import By
    import time
    
    URL = 'https://www.thewhiskyexchange.com/c/35/japanese-whisky'
    
    
    def main():
        driver = wd.Chrome(wd.ChromeOptions())
    
        driver.implicitly_wait(20)
        driver.get(URL)
    
        time.sleep(5)
    
        products = driver.find_elements(By.CLASS_NAME, 'product-card__name')
    
        print([p.text for p in products])
    
    
    if __name__ == '__main__':
        main()
    

    Learn more about selenium with the documentation here.

    Login or Signup to reply.
  2. The site is protected with CloudFlare. But even if you were able to pass the challenge (which is not possible afaik)
    that site can not be parsed like that. There are JavaScript parts which runs after the page is loaded. But since the requests library doesn’t have the ability of running JavaScript, you won’t get the exact page that you see when you open the link with your browser. This code with requests library would only work for static pages that does not contain any JavaScript code.

    You should be using a "headless web browser" or "web browser driver" with Selenium, to be able to get the exact page you see in your browser window.

    You can find the documentation in the link below:
    https://selenium-python.readthedocs.io/index.html

    The steps should be like:

    • Open the webpage with Selenium
    • Wait until the page is loaded (JS stuff will be processed by the web driver)
    • After waiting, get the HTML source from Selenium and pass it to BeautifulSoup
    Login or Signup to reply.
  3. You are using BeautifulSoup correct 🙂
    But You will need to access this webpage in another way then a simple request.get()

    Because what you a looking for in productlist aka {"class":"product-grid__item"} is not part of the returned string in k.text

    You can check k.text contents using another print like such print(f"k.text contains: {k.text}")

    For me this yeilds the following sting.
    Maybe you need to look at another link or using another tool for your product-grid__item, as it is not part of your current k.text.

    For your clue of what is wrong, look in the returned k.text:

    <span id="challenge-error-text">Enable JavaScript and cookies to continue</span> 
    

    k.text

    k.Text: Just a
    moment…Enable
    JavaScript and cookies to
    continue(function(){window._cf_chl_opt={cvId:
    ‘2’,cZone: "www.thewhiskyexchange.com",cType: ‘interactive’,cNounce:
    ‘45162’,cRay: ‘8098df96ecbfbe4c’,cHash: ‘1cf80c6fbf1a491’,cUPMDTk:
    "/c/35/japanese-whisky?__cf_chl_tk=VFxprfxk2K7uMv5y0LpXXlK_dih6FnMe3TwITIiFP6s-1695200377-0-gaNycGzNCfs",cFPWv:
    ‘b’,cTTimeMs: ‘1000’,cMTimeMs: ‘0’,cTplV: 5,cTplB: ‘cf’,cK:
    "visitor-time",fa:
    "/c/35/japanese-whisky?_cf_chl_f_tk=VFxprfxk2K7uMv5y0LpXXlK_dih6FnMe3TwITIiFP6s-1695200377-0-gaNycGzNCfs",md:
    "5XihUl6T63j9F7hlv4wbmqggz9GgVMAO.1AgdNd9M60-1695200377-0-AT-SWJ4pLjfTHh1KwUSjMvpy6ktRTUDkrD4WDXOafPGNAsu8LEt3hYkTdc_tNxcaTST2AgYa4WU5vGgkAYB5yXRAEonsWe–er-AWe6ffER3PMXl692b2c6KA552e9ahh79FzxPvgDoioIXIK8EYafFjfq80nxBLiy55QfgvCxq425N4NSozOHA3nVkpHrOYpScH8FlkZAE4rEsSu_hMSGIw0Qd6pEV4FMwSlTAKF5AfVL_BgRAjcZE076Lb1Nxdfr89FI77XxvGnMjimmxXXGRSkHDGPfKGgF0RTAB90ETVVHOfl8W-Nh5pAjIZERSDGIu2v3Sf-GRCnfRtDj0IXIAMTvVS_aES8E0no9HYlhIPJ-XdAQDd5Dm6YNUN9SXOjoQYGX4G16wF0ka1HKbfW1GU64Q1F71uM-vnoFkrFOU2fC8hb1Y7T-xrDweMQCZg0b2vAy1Id2U3wZh7MO7bwgsJ35DmcuP60CjunWuXAP9bS1kgOXkhKks9a1RN5d4TJNSYIAK0S8WHPu3Y5rtz4jcoxSTAMPLNgkMA3lBxi2fl2Oa-fjxffzyXpQGBz3rnrH7YxDIx6w2PhMYSaPAMEV1VzmV61rd5uR3Fm_HKHGFV15V5JrEQ8m4_rtkW4IzBEh4wRg1KQZTFy8qHBiqooPxItOPcBaAGokDe7gYi2A5fxao7f8qilJ1MHghhW436rbg872ZL8OZDHzsI3z-y_QIjN5Ncpu2uUWjfHFGe-qStrMlusJp6-JidzsdfG9ERj0AWdfIwp2dLff6LbaoA4PwBKnJsvcSqt7E0KB3W0jyb8ulriSOb1mFzXl9STkn7u9arts_7qVVLJR82bN81dBwkyf0aoqc3nZK98i3dv8zDo_Q4OtWfayiUcj4uwZtMcupxXbxSim7T1llhDCbdsjh0GeiWNOKAL-iClod2ru-Iziq_IID8NzUM8RbGVl98Ooeiqa1AVHBmNBTg3Nno6vuRKJWGp5VlkaHfLX0rTUAr43kPiLxqkzJA88snXuaXc3hmFB_74z-ddRDYt1n6Jzm8i9NUtEk76hERfI_1TtoV9sR8MaNo0A8NXMulu_KdMpdZX2IyNs-lq_DonUgfsaZndO5zgSFChsPpqJOQClCIxnIze0pC5VTjqkeTQXQ__DkK4Mt1A8ubzLAkydDXzWyD_3ooToYaOZNeQuUlcrlwMzDi2fn1CB_tFRrAksVDNAhLP7a1WfDd768jLhCGrfG_SQxTfy7fhf-rGfH8U0flEQHda2SshfusSaurdOVCPI4cIgMznEUhHBdWBst79bIgXbpCa6A_53E8wNteOBYNhBsGGRrdh5Wh2tWe8cvwp7lP3f5tz9PG_555Rs6BZx-YKIh4t-IAtXDTS7E_BvwEayNhL8XWdWH-Bg074utgU_IdA7acHszhAMlXlm8GaWgTKw0WhDw6ipBJrtzcVY-pHALdJlBTXxfsPIsPE6oBNBhmVQTpeHpIFvq6V9ypwTxopgb8ySoYS17ViS0ZSSRbihLtGEoj8S-qBP8-Y_1Yhu_UhGLD9J5LKHza8R4Qar4-KUKaf7yfeKBm6MVTFEi_sqiMYGmjmfbv2QyXHHjAF3MYsRSMWQGQaP3XDJ_sl9j2Wd8_82cfUKwn9ut6EfmzRtiqTMTHA3jNIrt0qNZffIhBI8yjYpud5SGgK0AOrwGY_fCYXGnTek6Ez6Z7QdQf2N2sDJ3CMa8KaQvaG1KZbEwrcW3IBet32RA54pePEQHQOBgb1Fi4rUrp6TzwWhmOzd5TRmU52ep5hMqNZXrqkcAyCyTl0fJQFdDjlrI7zzjDZu_5BLVoQ1Va0PAZN-UxuR7aatPtO2HLAasKvDFkNabVwUpOw1ivOLvsRSUgKQIZYjTLdFyyfkVMut6fEMUH65lb89Y1MvEgn8aBriDLMuJ4zjbgU89khvxYikUGyza6Lj6BP7huIsDpxI3JeZU12-tVzUHkCBZVEjW9Z7kvGFOI20VfGRX4ukzNZ0an_PtF3AP-exf0zZ3zM9s73lpmJv3rQ1c8JP6COSRAxumgntEqUe1NfUs3RNm056WNSvRvdqtUisLaqBMw592bEc5hTQz_sQrmFakRt3r_prmDWd4PVw4dLmxrDkBXp0cLVzFCyDNK2rXneD9eLnaNpdzZaGtZ3fPBf1D1wilDyavdXCnn-ibDA4lGymdeRqwPKLV
    -bZLv2m1dtnwpHb9K7KjfSUjFyyXfo4DbtYON5OOxSOHwsjMTFeEdhd31JT-_SamVuXkntC2mIutkJc20RvNkJ1Erf9crXYHRWy3muQdQQWZPartYMiLSFNn1jQs-5OA79zqQ0AcgDHE4jKNEEpP5QDqFZniOqrEhSl7pL_890eigayz1N_dmtSe62-2Py4c-J9ZB3zgsFjR7xk2z5B2xxgacVS7JDFss0XxahwdHyiGPhojh1ChlhF3H2qGy0yAMoqnm3eJRhLXRCBIWYKvwpXxG-fD4XZfSO2-OBRuDA",cRq:
    {ru:
    ‘aHR0cHM6Ly93d3cudGhld2hpc2t5ZXhjaGFuZ2UuY29tL2MvMzUvamFwYW5lc2Utd2hpc2t5’,ra:
    ‘cHl0aG9uLXJlcXVlc3RzLzIuMzEuMA==’,rm: ‘R0VU’,d:
    ‘JLbxFktqD2tP7FpVa04CFkkVQn7UyCxHXJttuNhO6cWq6Tiq3v7R+45u9vexHyIIgic4OnJUivb+/wMvcvf+1v1dt5hpQYPW8jf5RTRpHptxLJgTwfzezI6h6A6xoFxLm9CamevA9PpsV7F6ZMeZXBfc2TqhOtVRQ6mUFa1XOwrp7I4WLFESubrd383dvoKOTVb3f6x07teL+LQRn15UbRDRShMkA2bYmoeEWVTGK2CEeXaqaV/3NcUcOjyPLgptFRtGsk/Xngmfjx0rAf+Dn2t/FmujRO5zVLGczuJpyYhVeNy9d1AwvAGbcBZoLfstPsiqY+L7pgwqDuP12vOnJTiUcHiBfLPTU1qVdJglwfNxN3CVUmRh2Vt8QZoVgUwvtekgk7vJEfEtM9SQt9Ec06vXh4M+fM0RVlQ9JqyC7YPt+haZL9RsimlRJVLvVDIpSwivepmxSm8nb9PaspXm3WHx6NZMAm38Uvdd0N0GHN4m9oSwixUYs1hiaYE5T9EXYInX8GPhEEWKUr0S+8+qO62fU71G5krgZ359DU4hN8w=’,t:
    ‘MTY5NTIwMDM3Ny40MzkwMDA=’,cT: Math.floor(Date.now() / 1000),m:
    ‘a2AsyNIN/NAij7HFCGrX3XvzT5X+w1ymY4tyAvbok+k=’,i1:
    ‘gUidhQr6vHAcLFLeJUIaGg==’,i2: ’78MQeBOnbKsIzIbpivDnGA==’,zh:
    ‘Lu11UPKpct80h4UpHjnFr7549vZ5EGi2V7KjmFdfcoc=’,uh:
    ‘YE9XOpG5TeHmhA1zfs5mxC8CrRZzq2a/+r+OU7dliYQ=’,hh:
    ‘1iL0YmRIkuteIeg9zu2NzR9zkRexldgCzJYoCkFcSM8=’,}};var cpo =
    document.createElement(‘script’);cpo.src =
    ‘/cdn-cgi/challenge-platform/h/b/orchestrate/chl_page/v1?ray=8098df96ecbfbe4c’;window._cf_chl_opt.cOgUHash
    = location.hash === ” && location.href.indexOf(‘#’) !== -1 ? ‘#’ : location.hash;window._cf_chl_opt.cOgUQuery = location.search === ” &&
    location.href.slice(0, location.href.length –
    window._cf_chl_opt.cOgUHash.length).indexOf(‘?’) !== -1 ? ‘?’ :
    location.search;if (window.history && window.history.replaceState)
    {var ogU = location.pathname + window._cf_chl_opt.cOgUQuery +
    window._cf_chl_opt.cOgUHash;history.replaceState(null, null,
    "/c/35/japanese-whisky?__cf_chl_rt_tk=VFxprfxk2K7uMv5y0LpXXlK_dih6FnMe3TwITIiFP6s-1695200377-0-gaNycGzNCfs" + window._cf_chl_opt.cOgUHash);cpo.onload = function() {history.replaceState(null, null,
    ogU);}}document.getElementsByTagName(‘head’)[0].appendChild(cpo);}());

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search