skip to Main Content

I am very new to coding and am trying to write a practice script for webscraping in VS Code Editor. But every time i run the script i get this issue of there being no real output. Can you please advise on what the issue is? Note: the pink boxes are just covering my nameenter image description here

I tried running the code and expected webscraped data from the link. I have tried many different scripts and the same issue happens. So there must be something wrong with the whole system i think

2

Answers


  1. VSCode is an excellent IDE. When you start a new project (or open a folder in VSCode), it does not come with any build tools or compilers etc. You have to manually configure them. You have to set up the environment using different toolchains. Here are some instructions for Python

    Login or Signup to reply.
  2. This is not a problem with VSCode but I am going to answer your question.

    You can’t webscrape indeed.com with requests and beatiful soup because it has bot protection using cloudflare. If you take a closer look to the response it returns the 403 Forbidden status code instead of 200 OK. You can scrape using a headless browser using selenium.

    Here’s an example

    First install selenium and webdriver_manager

    pip install selenium webdriver_manager
    
    from selenium.webdriver import Chrome, ChromeOptions
    from selenium.webdriver.common.by import By
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager
    
    # Make sure you are not detected as HeadlessChrome, some sites will refuse access
    options = ChromeOptions()
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    
    driver = Chrome(options=options, service=Service(
        ChromeDriverManager().install()))
    
    # Make sure you are not detected as HeadlessChrome, some sites will refuse access
    ua = driver.execute_script("return navigator.userAgent").replace(
        "HeadlessChrome", "Chrome")
    driver.execute_cdp_cmd("Network.setUserAgentOverride", {
                           "userAgent": ua})
    driver.execute_script(
        "Object.defineProperty(navigator,'webdriver',{get:()=>undefined});")
    
    
    driver.get("https://www.indeed.com/companies/best-Agriculture-companies")
    main = driver.find_element(By.ID, "main")
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search