I tried to make a little Beautiful Soup Script, to analyze prices on eBay. So the problem is, that my soup.findAll() that should find the prices, is sometimes working, sometimes not and I am wondering why. So here is my code:
import requests
from bs4 import BeautifulSoup
from requests.models import encode_multipart_formdata
article = input("Product:")
keywords = article.strip().replace(" ", "+")
URL_s = "https://www.ebay.de/sch/i.html?_dmd=1&_fosrp=1&LH_SALE_CURRENCY=0&_sop=12&_ipg=50&LH_Complete=1&LH_Sold=1&_sadis=10&_from=R40&_sacat=0&_nkw=" + keywords + "&_dcat=139971&rt=nc&LH_ItemCondition=3"
source = requests.get(URL_s).text
soup = BeautifulSoup(source)
prices = soup.findAll('span', class_='bold bidsold')
# ^ this line sometimes finds the prices, sometimes it just produces an empty list ^
help would be very welcome, hope you are doing well, bye bye 🙂
3
Answers
Maybe the prices are rendered by JavaScript. Requests does not wait for the JavaScript to be loaded.
So thats why, you should use other modules, such as Selenium or DryScrape
If you look at the variable
soup
, and open the results as an html page you would see something like this:This means the ebay has some sort of a filtering mechanism to prevent scraping, and requires you to somehow confirm your identity. This is why your query for prices returns empty.
When using
requests
, the request may be blocked because the defaultuser-agent
in therequests
library ispython-requests
, in order for the website to understand that this is not a bot or script, you need to pass your real User-Agent to theheaders
.You can also read Reducing the chance of being blocked while web scraping blog post to learn about other options for solving this problem.
If you want to collect all the information from all pages, you can use a
while
loop that dynamically paginates all pages.The while loop will be executed until the stop command appears, in our case, the loop termination command will be to check for the presence of the next page, for which the CSS selector “.pagination__next” is responsible.
Check code in online IDE.
Example output:
As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.
Example code with pagination:
Output:
There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.