I am parsing some text in Python, using BeautifulSoup4.
The address block starts with a cell like this:
<td><strong>Address</strong></td>
I find the above cell using soup.find("td", "Address")
But, now some addresses have a highlight character too, like this:
<td><strong><span>*</span>Address</strong></td>
This has broken my matching. Is there still a way to find this TR?
2
Answers
I ended up with a solution like this:
The trick was that once I had a list of
<strong>
elements, I was able to userecursive=False
to prevent the<span>
being inspected.You can try using either
CSS selector or re
as follows:OR