Unable to find text from the html content beautifulsoup

JohnMathew
December 30, 2023
237 views
0 votes
2 Answers

from bs4 import BeautifulSoup
import re
text = "<tr>
<td style="width:127.5pt;padding:3.75pt 0in 3.75pt 0in" width="170">
<p class="MsoNormal"><span style="font-size:11.0pt">Job #<o:p></o:p></span></p>
</td>
<td style="padding:3.75pt 0in 3.75pt 3.75pt">
<p class="MsoNormal"><strong><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>TEST-12311</span></strong><span style="font-size:11.0pt"><o:p></o:p></span></p>
</td>
</tr>"
soup = BeautifulSoup(text,"html.parser")
print(soup)
job_number = soup.find("span", string="Job #")
print(job_number)

When I search for Job # it is showing None. But there is a <span> with text Job #.

Is there any solution to find <span> text which is followed by <td>.

Answers

- Alderven
- December 29, 2023 at 9:00 am
- 0 votes
0
Try:
```
soup.find_all('span')[1].text
```
It gives you:
```
TEST-12311
```
Login or Signup to reply.

- HedgeHog
- December 29, 2023 at 11:29 am
- 0 votes
0
I have to check Job # is there or not in the html content

You could use css selectors with pseudo class, to check if element contains a string:
```
soup.select_one('span:-soup-contains("Job #")')
```
or to check if it also has a sibling <td>:
```
soup.select_one('td:-soup-contains("Job #"):has(+ td)')
```
The other way around the combination that selects the sibling <td> of a <td> that contains a <span> with your string:
```
soup.select_one('td:has(span:-soup-contains("Job #")) + td').get_text(strip=True)
```
or not that strict:
```
soup.select_one('td:-soup-contains("Job #") + td').get_text(strip=True)
```
both above will give you TEST-12311 just in case that your string was found in previous sibling <td>.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.