I want to find all the li
elements nested within <ol class="messageList" id="messageList">
. I have tried the following solutions and they all return 0 messages:
messages = soup.find_all("ol")
messages = soup.find_all('div', class_='messageContent')
messages = soup.find_all("li")
messages = soup.select('ol > li')
messages = soup.select('.messageList > li')
The full html can be seen here in this gist.
- Just wondering what is the correct way of grabbing these list items.
- In beautiful soup do you have to know the nested path to get the element you are after. Or would doing something like
soup.find_all("li")
supposed to return all elements, whether it’s nested or not?
Happy for non-bs4 answers too.
Update
This is how I got the code.
from bs4 import BeautifulSoup
# Load the HTML content
with open('/tmp/property.html', 'r', encoding='utf-8') as file:
html_content = file.read()
# Create a BeautifulSoup object and specify the parser
soup = BeautifulSoup(html_content, 'html.parser')
The file is in the gist link above.
Update 2
I got it working using requests
library. Looks like manually downloading the file might have caused some of the html to break?
import requests
from bs4 import BeautifulSoup
url = "https://www.propertychat.com.au/community/threads/melbourne-property-market-2024.75213/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
messages = soup.select('.messageList > li')
2
Answers
Thank you for offering example code + data.
This will pick out the various list item elements you wanted:
You could certainly ask for
soup.find_all("li")
.That would retrieve all list items in the document,
even if they are under some other
<ul>
that you append to the document.I started out with looping over all the
<ol>
‘s,until I noticed the document only has one of them.
Typically I will write nested loops corresponding
to the nesting of the document’s elements,
but you certainly don’t have to.
It’s just easier to make sense of the results that way,
since you have context about which container the element came from.
Maybe this is what you’re looking for?
This code uses requests and bs4 to find the ol element that you mentioned and then a list of the li elements contained within the ol element is obtained and stored in the array object called li_list whose contents is then printed.