I am trying to collect data from shoes on eBay. For every item I want to collect all data including the custom description to build up a database. I have collected all aspects such as price, shipping title etc. with requests and BS4. Unfortunately, the only thing missing is the custom item description.
This seems to be an event html, which in a browser is loaded automatically, but not with requests and BS4. I would prefer to do it with requests and BS4, as the script is almost ready, and scraping through for example Selenium is much slower. The example I am working on is as follows:
from bs4 import BeautifulSoup as soup
import requests
source=requests.get("https://www.ebay.com/itm/SIGNED-2000-NIKE-AIR-JORDAN-1-HIGH-BANNED-1985-ROOKIE-RETRO-SHOES-AUTOGRAPH-UDA/392861887827?hash=item5b7864ad53:g:njcAAOSw9rpfALFX")
Nike_shoe = soup(source.text, "lxml")
The description part I am trying to filter contains, among other lines, the following excerpt:
This can be found a bit lower on the eBay page. This description is part of the following HTML structure:
When I scan through the Nike_shoe soup, this text is not present. I have tried to parse the source.text as lxml, html.parser, html5lib and xml.
I have also tried to use Requests-HTML package which should have full JavaScript support:
from requests_html import HTMLSession
session = HTMLSession()
source = session.get('https://www.ebay.com/itm/SIGNED-2000-NIKE-AIR-JORDAN-1-HIGH-BANNED-1985-ROOKIE-RETRO-SHOES-AUTOGRAPH-UDA/392861887827?hash=item5b7864ad53:g:njcAAOSw9rpfALFX')
Nike_shoe=soup(source.text, "html5lib")
But unfortunately, I was still not able to retrieve this data. Also I am not familiar with this package, so perhaps I am doing something wrong.
Edit 22/08/2020 13:41:
Both answers below (@Andrej Kesely & @p1xel) give correct results. p1xel his answer can be implemented as follows:
source=requests.get("https://www.ebay.com/itm/SIGNED-2000-NIKE-AIR-JORDAN-1-HIGH-BANNED-1985-ROOKIE-RETRO-SHOES-AUTOGRAPH-UDA/392861887827?hash=item5b7864ad53:g:njcAAOSw9rpfALFX")
Nike_shoe = soup(source.text, "lxml")
iframe=requests.get(Nike_shoe.find(id="desc_ifr")["src"])
Custom_description = soup(iframe.text, "html5lib")
print(Custom_description.find("td").text
SIGNED 2000NIKE AIR JORDAN 1 HIGH BANNED 1985 ROOKIE RETRO SHOES AUTOGRAPH UDA SIGNED IN PRESENCE OF UPPER DECK REPRESENTATIVES• SHOES ARE OFFICIAL RETRO FROM 2000, BRAND NEW WITH ORIGINAL BOX AND RETRO CARD Beautiful signature accompanied by CERTIFICATE OF AUTHENTICITY FROM THE UPPER DECK COMPANY, which currently HOLDS AN EXCLUSIVE RIGHTS to ALL authorized authentic Jordan autographed memorabilia & trading cards (No 3rd party authentication here!)Have a peace of mind knowing that YOU ARE GETTING THE REAL DEAL
RECENTLY ACQUIRED BIG COLLECTION FROM A PRIVATE COLLECTOR, PLEASE CHECK OUR AUCTION PERIODICALLY AS WE WILL CONTINUE TO POST NEW ITEMS DAILYIt says on the certificate that "Each individual product that bears the original autograph is signed in the presence of an Upper Deck Authenticated representative and registered by its numbered hologram and kept on permanent file", as part of UDA's patented 5-Step Hologram process. (NO LETTER OF OPINION HERE!)
Pictures are from the actual shoe you are bidding on.... BUY FROM A REPUTABLE COLLECTOR, Please check my feedbacks from previous satisfied buyers and bid with confidence.BUYER TO PAY $100 FOR FULLY INSURED shipping with tracking number & signature confirmation. International buyer are responsible for any import/customs duty fee that might be charged upon delivery of the packageRECENTLY ACQUIRED BIG COLLECTION FROM A PRIVATE COLLECTOR, PLEASE CHECK OUR AUCTION PERIODICALLY AS WE WILL CONTINUE TO POST NEW ITEMS DAILYALL SALES ARE FINAL. MAKE SURE TO CHECK MY OTHER AUCTIONS FOR MORE GREAT MJ MEMORABILIA
As p1xel his answer is completed through the requests format on the same page, this will be chosen as the accepted solution, but both solutions are fine.
2
Answers
It appears the description is within an iframe.
You need to find the
iframe
with iddesc_ifr
and simply make a request to itssrc
.This should do what you want (untested):
requests.get(Nike_shoe.find(id="desc_ifr")["src"])
The description is loaded from different URL, you only need item Id, in this case number
392861887827
:Prints: