skip to Main Content

Is there any solution to get a link from the HTML, which has a tag and a div tag?

html1:

<a href="https://u50.ct.sendgrid.net/ls" target="_blank">
      <div class="subtitle">
       Service request #2226754
      </div></a>

html2:

<div class="subtitle">
      Service request <a href="https://u5024.ct.sendgrid.net/ls" style="color:#5A88AA; text-decoration:underline;" target="_blank">#2604467</a>
     </div>

code:

from bs4 import  BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
scores_string = soup.find("div",text=re.compile(re.compile('Service request',re.IGNORECASE)))
print(scores_string)
ahref = scores_string.find_parent("a")
print(ahref["href"])  

Required solutions:
1)https://u50.ct.sendgrid.net/ls
2)https://u5024.ct.sendgrid.net/ls

I have two HTMLs. Both format are different. I need to take URL from both HTML. Is there any solution using beautifulsoup?

2

Answers


    1. Find the with the class subtitle.

    div = soup.find('div', class_='subtitle')

    1. Find the tag.

    div.find('a')

    1. Extract the href.

    link = a_tag['href']

    If the subtitle div is inside the a tag, just look for the wrapping div instead. You might also want to use error handling in these cases for the code above.

    Login or Signup to reply.
  1. Implementing a custom tag filter. My solution doesn’t need an extra import for _regex_s but for more complex cases it may be required or suggested.

    def f(tag):
      text = 'Service request'.casefold()
    
      if tag.name == "a" and 'href' in tag.attrs:
      
        for child_tag in tag.children:
          if child_tag.name == 'div' and child_tag.get_text(strip=True).casefold().startswith(text):
            return True
      
      if tag.name == 'div' and tag.get_text(strip=True).casefold().startswith(text):
      
        for child_tag in tag.children:
          if child_tag.name == "a" and 'href' in child_tag.attrs:
            return True
     
    # matches
    for m in soup.find_all(f):
      # "destrucring"
      if m.name != 'a':
        m = m.a
        
      print(m['href'])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search