BeautifulSoup not returning full html - 403 Forbidden? - Nginx

ajp093
February 25, 2021
153 views
0 votes
2 Answers

I am having issues with parsing a website. There seems to be a "403 Forbidden" error. Does that mean I cannot scrape through the website? If so, is there some sort of work around?

import requests
from bs4 import BeautifulSoup
import lxml

URL = 'https://frequentmiler.com/best-credit-card-sign-up-offers/'
webpage = requests.get(URL)

soup = BeautifulSoup(webpage.content, 'lxml')

print(soup.prettify())

This returns:

<html>
 <head>
  <title>
   403 Forbidden
  </title>
 </head>
 <body>
  <center>
   <h1>
    403 Forbidden
   </h1>
  </center>
  <hr/>
  <center>
   nginx
  </center>
 </body>
</html>

Answers

- j0eTheRipper
- February 25, 2021 at 4:15 pm
- 0 votes
0
That means that you’re not authorized to view that url.

Login or Signup to reply.

- bguernouti
- February 25, 2021 at 4:44 pm
- 0 votes
0
The website is knowing that you’re trying to get a source page from a python code, you must escape this by adding a user-agent in request headers.
```
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36"}
webpage = requests.get(URL,headers=headers)
```
And now, you are like a human surfer using a simple web browser =).
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

BeautifulSoup not returning full html – 403 Forbidden? – Nginx

Answers