skip to Main Content

I’m trying to capture the number of elements in a list using Beautiful Soup but I’m encountering an issue and getting a null array back. I’m pretty sure this used to work for me but not anymore.

I’d appreciate any help or pointers from the gurus out there as I’m sure there is a better way. I’m completely new to this and feel a bit lost.

So if we take a nested list like below with 3 elements:

<div class="row">
...
<div class="style_details">
<ul data-id="list" class="listing_details">
<li data-id="listing-index-1"></li>
<li data-id="listing-index-2"></li>
<li data-id="listing-index-3"></li>
</ul>
</div>

and a snippet of code to count the list elements using the attribute ‘class="listing_details"’

browser.get(url)
c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
dom = etree.HTML(str(soup))  
data = soup.findAll('li',attrs={'class':'listing_details'})
links = len(data)

return links

Is the class being nested in an unordered list causing the issue? Any ideas how to overcome this or a better way to count items on the list?

2

Answers


  1. It seems you are using both BS and lxml. In either case, you should be counting the number of the children of <ul data-id="list" class="listing_details">.

    So with BS, it should be (using css selectors):

    data = soup.select('ul.listing_details li')
    print(len(data))
    

    and with lxml:

    data2 = dom.xpath('//ul[@class="listing_details"]//li')
    print(len(data2))
    

    The output should be 3 in both cases.

    Login or Signup to reply.
  2. If you want to select only direct children you can use next example:

    from bs4 import BeautifulSoup
    
    html_text = """
    <div class="row">
    <div class="style_details">
    <ul data-id="list" class="listing_details">
    <li data-id="listing-index-1"></li>
    <li data-id="listing-index-2"></li>
    <li data-id="listing-index-3"></li>
    </ul>
    </div>"""
    
    soup = BeautifulSoup(html_text, "html.parser")
    
    # print only direct <li> under <ul class="listing_details">
    # note the " > " in the CSS selector
    for li in soup.select("ul.listing_details > li"):
        print(li)
    

    Prints:

    <li data-id="listing-index-1"></li>
    <li data-id="listing-index-2"></li>
    <li data-id="listing-index-3"></li>
    

    OR: Using bs4 API:

    ul = soup.find("ul", class_="listing_details")
    for li in ul.find_all("li", recursive=False):
        print(li)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search