skip to Main Content

I am having issues with parsing a website. There seems to be a "403 Forbidden" error. Does that mean I cannot scrape through the website? If so, is there some sort of work around?

import requests
from bs4 import BeautifulSoup
import lxml

URL = 'https://frequentmiler.com/best-credit-card-sign-up-offers/'
webpage = requests.get(URL)

soup = BeautifulSoup(webpage.content, 'lxml')

print(soup.prettify())

This returns:

<html>
 <head>
  <title>
   403 Forbidden
  </title>
 </head>
 <body>
  <center>
   <h1>
    403 Forbidden
   </h1>
  </center>
  <hr/>
  <center>
   nginx
  </center>
 </body>
</html>

2

Answers


  1. That means that you’re not authorized to view that url.

    Login or Signup to reply.
  2. The website is knowing that you’re trying to get a source page from a python code, you must escape this by adding a user-agent in request headers.

    headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"}
    webpage = requests.get(URL,headers=headers)
    

    And now, you are like a human surfer using a simple web browser =).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search