skip to Main Content

I am new to programming and also new to pyhon.

My intension is to built an ebay webscraper.

I am trying to extract an list of links with the bs4 find_all() method, but no matter what I try, it returns always an empty list.

def get_index_data(soup):

    try:
        links = soup.find_all('a', {'class': 's-item__link'})
        print(links)
    except:
        links = []
        print(links)

I wrote it also that way.

links = soup.find_all('a', class_= 's-item__link')

It also returns an empty list. I absolutely don´t know what is wrong

Edit:

    import requests
from bs4 import BeautifulSoup


def get_page(url):

    response = requests.get(url)

    if not response.ok:
        print('server responded: ', response.status_code)
    else:
        soup = BeautifulSoup(response.text, 'lxml')
    return soup


def get_index_data(soup):
    links = soup.find_all('a')

    print(links)


def main():

    url = 'https://www.ebay.de/sch/i.html?_nkw=armbanduhr&_pgn=1 '
    get_index_data(get_page(url))


if __name__ == '__main__':
    main()

Edit2

Error after I run the code with only .find_all(‘a’)


Traceback (most recent call last):
  File "C:UsersAleksandarDesktopMy ebay scrapertest", line 29, in <module>
    main()
  File "C:UsersAleksandarDesktopMy ebay scrapertest", line 25, in main
    get_index_data(get_page(url))
  File "C:UsersAleksandarDesktopMy ebay scrapertest", line 19, in get_index_data
    print(links)
  File "C:UsersAleksandarAppDataLocalProgramsPythonPython38libencodingscp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'u2705' in position 28776: character maps to <undefined>

2

Answers


  1. Your code do not show us the url that you are trying to parse.

    Please… Try to undestand the Concepts parsing one simple page…

    Ebay uses JavaScript and it is a little bit harder to scrape

    I will write down a simple one…

    Hope that help you to understand some concepts…

    from bs4 import BeautifulSoup
    import requests
    
    page = "https://en.wikipedia.org/wiki/Main_Page"
    
    page_text = requests.get(page).text
    
    soup = BeautifulSoup(page_text, 'lxml')
    
    # print(soup)
    links = []
    links = soup.find_all("a")
    
    for link in links:
        print(link)
    
    Login or Signup to reply.
  2. BeautifulSoup has a few different types of parsers for different situations. In the past I have stuck with the “html.parser” instead of “lxml”. Sometimes using “lxml” will actually return None in a situation where “html.parser” will return a result.

    That could be why you get your error messages and the empty result, i’d try that. When I wrote up something similar to yours it worked. Since the a tag is used alot you’re probably going to get a huge chunk of stuff to parse through but if you change from lxml to html.parser it should work!

    Web scraping can be tough to get the hang of but once you do it’s really fun to do. There are really great videos that talk about beautifulsoup on Youtube.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search