Html - How to parse and extract the tags containing ::marker?

LiMinHeng
May 5, 2024
262 views
0 votes
2 Answers

So I’m trying to scrap some data from a website, and I want to extract the text inside the <li tag as shown below, the problem is they contain these ::markers that I understand are psudoelements, therefore they can’t be parsed using BeautifulSoup?

<ul>
    <li>
        ::marker
        (text)
    </li>
    <li>
        ::marker
        (text)
    </li>
</ul>

This is what I tried, but it didn’t returned other <li tags that don’t contain the ::marker

from bs4 import BeautifulSoup
import requests 


url = *the link of the website
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

reference = soup.find("li")
print(reference.text) 

#output is None

Answers

- oldboy
- May 5, 2024 at 5:43 pm
- 0 votes
0
As there are multiple items it is probably an idea to use find_all and then iterate through those entries calling get_text on each one; something like:
```
list_items = soup.find_all("li")
for element in list_items:
    print(element.get_text())
```
You could add some extra code to check that find_all does actually return at least one element.
Login or Signup to reply.

- AliAkbar
- May 5, 2024 at 5:50 pm
- 0 votes
0
You can use a CSS selector to extract the text content of the li elements, excluding the ::marker pseudo-elements.
like this
```
li_elements = soup.select('li')
for li in li_elements:
    text = li.get_text(strip=True)
    print(text)
```
Note that this will extract the text content of all li elements, including those without the ::marker pseudo-element. If you only want to extract the text content of li elements that don’t contain the ::marker pseudo-element, you can modify the CSS selector to exclude those elements:
```
li_elements = soup.select('li:not(::marker)')
```
This will select only the li elements that don’t contain the ::marker pseudo-element.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – How to parse and extract the tags containing ::marker?

Answers