Beautiful Soup - ignore `<span>` while providing `string` to `find()` method - Html - PhpOut

ianmayo
April 6, 2023
249 views
0 votes
2 Answers

I am parsing some text in Python, using BeautifulSoup4.

The address block starts with a cell like this:

<td><strong>Address</strong></td>

I find the above cell using soup.find("td", "Address")

But, now some addresses have a highlight character too, like this:

<td><strong><span>*</span>Address</strong></td>

This has broken my matching. Is there still a way to find this TR?

Tags: beautifulsoup html python

Answers

Chosen as BEST ANSWER
- ianmayo
- April 6, 2023 at 6:47 pm
- 0 votes
0
I ended up with a solution like this:
```
    strong_blocks = soup.find_all("strong")
    def common_block(tag):
        return tag.find(string="Address", recursive=False)
    address_texts = list(filter(common_block, strong_blocks))
    if len(address_texts) == 1:
        address_text = address_texts[0]
        address_cell = address_text.parent
```
The trick was that once I had a list of <strong> elements, I was able to use recursive=False to prevent the <span> being inspected.

(Edit)

- MdFazlulHoque
- April 6, 2023 at 6:21 pm
- 0 votes
0
You can try using either CSS selector or re as follows:
```
soup.select('td:has(strong:contains("Address"))')
```
OR
```
import re
soup.find("td", text=re.compile("Address"))
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.