skip to Main Content

I’m trying to practice webs scraping locally using requests-html and an eBay page.
Whenever I try to return the text of an element all I get is a ton of text I didn’t want(I think the entire text on the page).
This is the URL I downloaded for use offline: https://www.ebay.com/sch/CPUs-Processors/164/i.html?_geositeid=0&_sop=1&_dmd=1&_ipg=240&_fosrp=1&_oaa=1&_nkw=%22ryzen%22&_dcat=164&rt=nc&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_udlo&_udhi=110&LH_AllListings=1&LH_PrefLoc=3

So far, I’ve been able to reproduce the output by just importing the library, defining the html to be parsed, and then running the code snippet below.

Below is my code where I use the method on an element that is supposed to return "Pre-Owned".

from requests_html import HTML

doc = '''<The entire HTML directly copied from view source on the website>'''

html = HTML(html=doc)

element = (html.find('#item2b11be4226 > div:nth-child(3)', first=True)) #If I print just the element it will return an object of type element, the right element I believe
print(element.text) #but when I print this it returns all kinds of text that is not associated with the element I'm looking for

But instead the output looks like this:

Brand New
$100.00
or Best Offer
Free shipping
3 Watching
Watch
AMD Ryzen 9 3900X box only
Open Box
$10.00
or Best Offer
+$7.65 shipping
Watch
AMD CPU Ryzen 5 1600 3.2GHz Processor
Pre-Owned
1 product rating
$90.00
or Best Offer
+$20.00 shipping
Watch
AMD RYZEN 5 2600 fan only read description
Open Box
$25.00
or Best Offer
+$10.00 shipping
Watch
New listing AMD YD190XA8AEWOF Ryzen Threadripper 1900X (8-core/16-thr
ead) Desktop Processor
Pre-Owned
6 product ratings
$109.99
Buy It Now
+$13.18 shipping
Watch
5 new & refurbished from $190.00
New listing AMD Ryzen 3 1200 3.1GHz Quad-Core Processor (YD1200BBAEBOX
) With Cooler In Box
New (Other)
58 product ratings
$82.25
or Best Offer
Shipping not specified
Watch
2 new & refurbished from $236.99
From Canada
Tell us what you think
eBay determines this price through a machine learned model of the product's sale prices within the last 90 days.
eBay determines trending price through a machine learned model of the product’s sale prices within the last 90 days. "New" refers to a brand-new, unused, unopened, undamaged item, and "Used" refers to an item that has been used previously.
Top Rated Plus

Sellers with highest buyer ratings
Returns, money back
Ships in a business day with tracking
Learn More
Top Rated Plus

Sellers with highest buyer ratings
Returns, money back
Ships in a business day with tracking
Learn More
Search refinements
Categories
All
Computers/Tablets & Networking (1,149)
Computer Components & Parts (1,013)
Fans, Heat Sinks & Cooling (418)
Motherboards (299)
CPUs/Processors (134)
Memory (RAM) (62)
Laptop Replacement Parts (29)
Other Components & Parts (24)
Power Supplies (20)
Motherboard Components & Accs (13)
Motherboard & CPU Combos (7)
Interface/Add-On Cards (6)
Computer Cases & Accessories (1)
Graphics/Video Cards (1)
More
Clothing, Shoes & Accessories (182)
eBay Motors (109)
Home & Garden (31)
Consumer Electronics (25)
Collectibles (5)
Business & Industrial (4)
Crafts (2)
Video Games & Consoles (1)
Sporting Goods (1)
Books & Magazines (1)
Show more
Number of Cores
see allNumber of Cores
12 (2)
4 (21)
6 (41)
8 (26)
Socket Type
see allSocket Type
Socket AM2 (5)
Socket AM2+ (5)
Socket AM3 (4)
Socket AM4 (88)
Processor Type
see allProcessor Type
Ryzen 3 (17)
Ryzen 5 (51)
Ryzen 7 (27)
Ryzen 9 (4)
Ryzen Threadripper (2)
Brand
see allBrand
AMD (110)
Unbranded (21)
Processor Model
see allProcessor Model
AMD A10-5700 (13)
AMD Ryzen 3 2200G (6)
AMD Ryzen 5 1600 (7)
AMD Ryzen 5 2400G (1)
AMD Ryzen 5 2600 (14)
AMD Ryzen 5 2600X (5)
AMD Ryzen 7 1700 (8)
AMD Ryzen 7 2700X (7)
L3 Cache
see allL3 Cache
16 MB (44)
4 MB (8)
3 MB (5)
32 MB (5)
8 MB (5)
L2 Cache
see allL2 Cache
4 MB (23)
3 MB (22)
2 MB (17)
Bus Speed
see allBus Speed
100 MHz (15)
3200 MHz (8)
3400 MHz (3)
4800 MHz (8)
Clock Speed
see allClock Speed
More than 3.5 GHz (53)
Not Specified (51)
Guaranteed Delivery
see allGuaranteed Delivery
No Preference(filter applied)
1 Day Shipping
2 Day Shipping
3 Day Shipping
4 Day Shipping
Condition
see allCondition
New(filter applied)
Open box(filter applied)
Seller refurbished(filter applied)
Used(filter applied)
Price
Please provide a valid price range
Under $110.00 (filter applied)
$ Enter minimum price to $ Enter maximum price


Format
see allFormat
All Listings(filter applied) (134)
Auction (78)
Buy It Now (67)
Item Location
see allItem Location
Default
Within
Within
2 miles
5 miles
10 miles
15 miles
25 miles
50 miles
75 miles
100 miles
150 miles
200 miles
500 miles
750 miles
1000 miles
1500 miles
2000 miles
of Enter your ZIP code  Go
Please enter a valid zipcode
US Only
North America(filter applied)
Worldwide
see allSeller
Seller
Delivery Options
see allDelivery Options
Free shipping
Show only
see allShow only
Free Returns
Returns accepted
Completed listings
Sold listings
Deals & Savings
More refinements...
Additional navigation

and there’s more but it’s already past the max amount of characters for this post.

3

Answers


  1. I had a similar problem.

    If your python version is 3.9, you might try downgrading to 3.8.

    This solved the problem for me.

    Login or Signup to reply.
  2. This was happening to me, it didn’t work with Python 3.9.
    It drove me crazy for two days.

    I’ve downgraded it to Python 3.6 and now is working fine.
    Just on a local Environment will be enough.

    Login or Signup to reply.
  3. You should use .text for "element" attribute of the .find() method result

    result = html.find('#item2b11be4226 > div:nth-child(3)', first=True)
    result_text = result.element.text
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search