skip to Main Content
from bs4 import BeautifulSoup
import re
text = "<tr>
<td style="width:127.5pt;padding:3.75pt 0in 3.75pt 0in" width="170">
<p class="MsoNormal"><span style="font-size:11.0pt">Job #<o:p></o:p></span></p>
</td>
<td style="padding:3.75pt 0in 3.75pt 3.75pt">
<p class="MsoNormal"><strong><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>TEST-12311</span></strong><span style="font-size:11.0pt"><o:p></o:p></span></p>
</td>
</tr>"
soup = BeautifulSoup(text,"html.parser")
print(soup)
job_number = soup.find("span", string="Job #")
print(job_number)

When I search for Job # it is showing None. But there is a <span> with text Job #.

Is there any solution to find <span> text which is followed by <td>.

2

Answers


  1. Try:

    soup.find_all('span')[1].text
    

    It gives you:

    TEST-12311
    
    Login or Signup to reply.
  2. I have to check Job # is there or not in the html content

    You could use css selectors with pseudo class, to check if element contains a string:

    soup.select_one('span:-soup-contains("Job #")')
    

    or to check if it also has a sibling <td>:

    soup.select_one('td:-soup-contains("Job #"):has(+ td)')
    

    The other way around the combination that selects the sibling <td> of a <td> that contains a <span> with your string:

    soup.select_one('td:has(span:-soup-contains("Job #")) + td').get_text(strip=True)
    

    or not that strict:

    soup.select_one('td:-soup-contains("Job #") + td').get_text(strip=True)
    

    both above will give you TEST-12311 just in case that your string was found in previous sibling <td>.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search