I am having issues with parsing a website. There seems to be a "403 Forbidden" error. Does that mean I cannot scrape through the website? If so, is there some sort of work around?
import requests
from bs4 import BeautifulSoup
import lxml
URL = 'https://frequentmiler.com/best-credit-card-sign-up-offers/'
webpage = requests.get(URL)
soup = BeautifulSoup(webpage.content, 'lxml')
print(soup.prettify())
This returns:
<html>
<head>
<title>
403 Forbidden
</title>
</head>
<body>
<center>
<h1>
403 Forbidden
</h1>
</center>
<hr/>
<center>
nginx
</center>
</body>
</html>
2
Answers
That means that you’re not authorized to view that url.
The website is knowing that you’re trying to get a source page from a python code, you must escape this by adding a user-agent in request headers.
And now, you are like a human surfer using a simple web browser =).