I am new to programming and also new to pyhon.
My intension is to built an ebay webscraper.
I am trying to extract an list of links with the bs4 find_all() method, but no matter what I try, it returns always an empty list.
def get_index_data(soup):
try:
links = soup.find_all('a', {'class': 's-item__link'})
print(links)
except:
links = []
print(links)
I wrote it also that way.
links = soup.find_all('a', class_= 's-item__link')
It also returns an empty list. I absolutely don´t know what is wrong
Edit:
import requests
from bs4 import BeautifulSoup
def get_page(url):
response = requests.get(url)
if not response.ok:
print('server responded: ', response.status_code)
else:
soup = BeautifulSoup(response.text, 'lxml')
return soup
def get_index_data(soup):
links = soup.find_all('a')
print(links)
def main():
url = 'https://www.ebay.de/sch/i.html?_nkw=armbanduhr&_pgn=1 '
get_index_data(get_page(url))
if __name__ == '__main__':
main()
Edit2
Error after I run the code with only .find_all(‘a’)
Traceback (most recent call last):
File "C:UsersAleksandarDesktopMy ebay scrapertest", line 29, in <module>
main()
File "C:UsersAleksandarDesktopMy ebay scrapertest", line 25, in main
get_index_data(get_page(url))
File "C:UsersAleksandarDesktopMy ebay scrapertest", line 19, in get_index_data
print(links)
File "C:UsersAleksandarAppDataLocalProgramsPythonPython38libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'u2705' in position 28776: character maps to <undefined>
2
Answers
Your code do not show us the url that you are trying to parse.
Please… Try to undestand the Concepts parsing one simple page…
Ebay uses JavaScript and it is a little bit harder to scrape…
I will write down a simple one…
Hope that help you to understand some concepts…
BeautifulSoup has a few different types of parsers for different situations. In the past I have stuck with the “html.parser” instead of “lxml”. Sometimes using “lxml” will actually return None in a situation where “html.parser” will return a result.
That could be why you get your error messages and the empty result, i’d try that. When I wrote up something similar to yours it worked. Since the a tag is used alot you’re probably going to get a huge chunk of stuff to parse through but if you change from lxml to html.parser it should work!
Web scraping can be tough to get the hang of but once you do it’s really fun to do. There are really great videos that talk about beautifulsoup on Youtube.