So I’m trying to scrap some data from a website, and I want to extract the text inside the <li tag as shown below, the problem is they contain these ::markers that I understand are psudoelements, therefore they can’t be parsed using BeautifulSoup?
<ul>
<li>
::marker
(text)
</li>
<li>
::marker
(text)
</li>
</ul>
This is what I tried, but it didn’t returned other <li tags that don’t contain the ::marker
from bs4 import BeautifulSoup
import requests
url = *the link of the website
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
reference = soup.find("li")
print(reference.text)
#output is None
2
Answers
As there are multiple items it is probably an idea to use
find_all
and then iterate through those entries callingget_text
on each one; something like:You could add some extra code to check that
find_all
does actually return at least one element.You can use a CSS selector to extract the text content of the li elements, excluding the ::marker pseudo-elements.
like this
Note that this will extract the text content of all li elements, including those without the ::marker pseudo-element. If you only want to extract the text content of li elements that don’t contain the ::marker pseudo-element, you can modify the CSS selector to exclude those elements:
This will select only the li elements that don’t contain the ::marker pseudo-element.