Html - beautifulsoup get Href having div text

Toninthomas
June 18, 2024
186 views
0 votes
2 Answers

Is there any solution to get a link from the HTML, which has a tag and a div tag?

html1:

<a href="https://u50.ct.sendgrid.net/ls" target="_blank">
      <div class="subtitle">
       Service request #2226754
      </div></a>

html2:

<div class="subtitle">
      Service request <a href="https://u5024.ct.sendgrid.net/ls" style="color:#5A88AA; text-decoration:underline;" target="_blank">#2604467</a>
     </div>

code:

from bs4 import  BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
scores_string = soup.find("div",text=re.compile(re.compile('Service request',re.IGNORECASE)))
print(scores_string)
ahref = scores_string.find_parent("a")
print(ahref["href"])  

Required solutions:
1)https://u50.ct.sendgrid.net/ls
2)https://u5024.ct.sendgrid.net/ls

I have two HTMLs. Both format are different. I need to take URL from both HTML. Is there any solution using beautifulsoup?

Answers

- LiteralGoat
- June 18, 2024 at 8:13 am
- 0 votes
0
1. Find the with the class subtitle.
div = soup.find('div', class_='subtitle')
1. Find the tag.
div.find('a')
1. Extract the href.
link = a_tag['href']

If the subtitle div is inside the a tag, just look for the wrapping div instead. You might also want to use error handling in these cases for the code above.
Login or Signup to reply.

Implementing a custom tag filter. My solution doesn’t need an extra import for _regex_s but for more complex cases it may be required or suggested.

def f(tag):
  text = 'Service request'.casefold()

  if tag.name == "a" and 'href' in tag.attrs:
  
    for child_tag in tag.children:
      if child_tag.name == 'div' and child_tag.get_text(strip=True).casefold().startswith(text):
        return True
  
  if tag.name == 'div' and tag.get_text(strip=True).casefold().startswith(text):
  
    for child_tag in tag.children:
      if child_tag.name == "a" and 'href' in child_tag.attrs:
        return True
 
# matches
for m in soup.find_all(f):
  # "destrucring"
  if m.name != 'a':
    m = m.a
    
  print(m['href'])

Please signup or login to give your own answer.

Click here to cancel reply.

Html – beautifulsoup get Href having div text

Answers