skip to Main Content

I working on a beautifulsoup finding of all elements that are related to static.

What is the list of all static-related tags and elements?

Is there any other way apart from finding by multiple BeautifulSoup.findAll() with different arguments?

my current best version looks like

stitic_ = souped.findAll('link', rel='stylesheet') + 
          souped.findAll('img') + 
          souped.findAll('script') + 
          souped.findAll('video')

if it’s only correct, maybe there are some elements I passed through.

2

Answers


  1. Note: In newer code avoid old syntax findAll() instead use find_all() or select()– For more take a minute to check BeautifulSoup docs

    So based on your question you could chain your tags:

    soup.select('img, script, video, link[rel="stylesheet"]')
    
    Login or Signup to reply.
  2. find_all accepts a function as parameter that can be used to filters the tags, see doc.

    def filter_static(tag):
        if tag.name in {'img', 'script', 'video'}:
            return True
        elif tag.name == 'link':
            for attr in tag.attrs.get('rel', {}):
                if attr == 'stylesheet':
                    return True
        return False
    
    soup = BeautifulSoup(html, 'lxml')
    for i, match in enumerate(soup.find_all(filter_static)):
        print(f"[{i}] {match}")
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search