Html - Problem finding number of elements in a list by searching for class attribute - BeautifulSoup

DCUpro
September 19, 2023
269 views
0 votes
2 Answers

I’m trying to capture the number of elements in a list using Beautiful Soup but I’m encountering an issue and getting a null array back. I’m pretty sure this used to work for me but not anymore.

I’d appreciate any help or pointers from the gurus out there as I’m sure there is a better way. I’m completely new to this and feel a bit lost.

So if we take a nested list like below with 3 elements:

<div class="row">
...
<div class="style_details">
<ul data-id="list" class="listing_details">
<li data-id="listing-index-1"></li>
<li data-id="listing-index-2"></li>
<li data-id="listing-index-3"></li>
</ul>
</div>

and a snippet of code to count the list elements using the attribute ‘class="listing_details"’

browser.get(url)
c = browser.page_source
soup = BeautifulSoup(c, "html.parser")
dom = etree.HTML(str(soup))  
data = soup.findAll('li',attrs={'class':'listing_details'})
links = len(data)

return links

Is the class being nested in an unordered list causing the issue? Any ideas how to overcome this or a better way to count items on the list?

Answers

- JackFleeting
- September 13, 2023 at 1:52 pm
- 0 votes
0
It seems you are using both BS and lxml. In either case, you should be counting the number of the children of <ul data-id="list" class="listing_details">.

So with BS, it should be (using css selectors):
```
data = soup.select('ul.listing_details li')
print(len(data))
```
and with lxml:
```
data2 = dom.xpath('//ul[@class="listing_details"]//li')
print(len(data2))
```
The output should be 3 in both cases.
Login or Signup to reply.

If you want to select only direct children you can use next example:

from bs4 import BeautifulSoup

html_text = """
<div class="row">
<div class="style_details">
<ul data-id="list" class="listing_details">
<li data-id="listing-index-1"></li>
<li data-id="listing-index-2"></li>
<li data-id="listing-index-3"></li>
</ul>
</div>"""

soup = BeautifulSoup(html_text, "html.parser")

# print only direct <li> under <ul class="listing_details">
# note the " > " in the CSS selector
for li in soup.select("ul.listing_details > li"):
    print(li)

Prints:

<li data-id="listing-index-1"></li>
<li data-id="listing-index-2"></li>
<li data-id="listing-index-3"></li>

OR: Using bs4 API:

ul = soup.find("ul", class_="listing_details")
for li in ul.find_all("li", recursive=False):
    print(li)

Please signup or login to give your own answer.

Click here to cancel reply.

Html – Problem finding number of elements in a list by searching for class attribute – BeautifulSoup

Answers