I am trying to scrape a custom eBay search that shows 200 items on a single page. I need to get the title of the item, the price and the link to the said item. So far so good. But I also would like the code to follow the link to the next page with 200 or less items and extract them as well.
This is the code, I am using:
from urllib.request import urlopen as Req
from bs4 import BeautifulSoup as souce
start_url='https://www.ebay.de/sch/i.html?_fosrp=1&_from=R40&_nkw=iphone&_in_kw=1&_ex_kw=&_sacat=0&_mPrRngCbx=1&_udlo=600&_udhi=4.800&LH_BIN=1&LH_ItemCondition=4&_ftrt=901&_ftrv=1&_sabdlo=&_sabdhi=&_samilow=&_samihi=&_sadis=10&_fpos=&LH_SubLocation=1&_sargn=-1%26saslc%3D0&_fsradio2=%26LH_LocatedIn%3D1&_salic=77&_saact=77&LH_SALE_CURRENCY=0&_sop=2&_dmd=1&_ipg=200'
Client=Req(start_url)
page_html=Client.read()
Client.close()
page_soup=souce(page_html, "html.parser")
containers_listings = page_soup.findAll("li",{"class":"sresult lvresult clearfix li"})
container_next=page_soup.find("td",{"class":"pagn-next"})
next_url=container_next.a["href"]
filename="scrape_ebay.csv"
f=open(filename,"w")
headers="item_title,item_link,item_pricen"
f.write(headers)
for container in containers_listings:
item_title=container.h3.text.strip()
item_link=container.h3.a["href"].strip()
item_price=container.span.text.strip()
f.write(item_title + "," + item_link + "," + item_price.replace(",",".") + "n")
f.close()
I am running into a problem with the eBay pagination. I have managed to isolate and extract the next link but I have no idea how to implement it into a loop that would visit the next pages and extract the information. Any help would be greatly appreciated!
Thanks in advance.
3
Answers
You need to append new urls to a list as you find them, and continually iterate through the list of urls extracting the content you are looking for.
Your current link garners results under 200, thus, no pagination is given, however, navigating to a more popular page, such as listings for “macbooks” yields results on multiple pages. The link used for demonstration can be found here. To find the pages, the full pagination
a
tag text can be found, and when looping over the latter results, the page number at the current iteration can be concatenated at the end of the link:Output (first printed result):
However, if the input does not contain pagination links, only the first page will be accessible:
Update November 2019:
The solution is outdated. A possible way to handle Ebay pagination currently is below:
eBay has a
_pgn
URL parameter which responsible for pagination. With that in mind, we can increment its value by 1 for the second, third, and so on pages accordingly.We need to increment the page value only when a certain condition is met and only use the
while
loop to paginate through all possible pages dynamically i.e not usingfor i in range(1, 100)
which is hardcoded way of doing pagination.Keep in mind that your
requests.get()
andsoup
need to be insidewhile
loop as this is what let data be updated.To exit from the infinite
while
loop we need to check every time if the next page button is active (in the case of eBay, on different websites it might be different) with a.pagination__next
CSS selector. If it becomes inactive (no next page), we’llbreak
out of the loop:A prettier and more readable way could be to use parameters
dict
, it is a convenient way of updating URL search parameters:With frequent requests, the site may start blocking your requests if using
requests
as defaultuser-agent in requests
library is apython-requests
. You can learn more about ways to bypass the blocking in the Reducing the chance of being blocked while web scraping blog post.See how pagination works in online IDE
Example output:
As another option you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.
Example code with pagination:
Output: