The script below is meant to look through ebay listings on the ebay search page. The search page is just a list, so I am trying to loop through each li tag and add the content to a variable. For some reason this script doesn’t seem to want to work and I’m not sure why.
from urllib.request import urlopen
from bs4 import BeautifulSoup
# specify the url
url = "https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=funko+gamora+199&_sacat=0&LH_Sold=1&LH_Complete=1&rt=nc&LH_PrefLoc=1&_ipg=200"
# Connect to the website and return the html to the variable ‘page’
try:
page = urlopen(url)
except:
print("Error opening the URL")
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')
# Take out the <div> of name and get its value
content = soup.find('ul', {"class": "srp-results srp-list clearfix"})
#print(content)
article = ''
for i in content.findAll('li'):
article = article + ' ' + i.text
print(article)
# Saving the scraped text
with open('scraped_text.txt', 'w') as file:
file.write(article)
Can anyone see where I’m going wrong?
2
Answers
This is what the response looks like:
It’s an error on ebay-end, your code looks fine at first glance. Also, note that webscraping is a grey area and some companies do not allow it. You might need to bypass security measures.
Also, you should comment your code in such way that tells the reader WHY your code does what it does, not what it does. You don’t have to comment things like "soup = BeautifulSoup(page, ‘html.parser’)"
Edit: I forgot to mention, error appears, because
found no results.
Most likely you get a CAPTCHA or IP rate limit. Ways to avoid being blocked.
If you need to extract all results from all pages using pagination, the solution to this would be to use an non-token pagination and test for something (button, element) that will result in an exit:
You can also add a condition for exiting the loop by the number of retrieved pages by adding a limit:
Code example with pagination in the online IDE.
Example output:
As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.
Example code with pagination:
Output:
There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.