skip to Main Content

Is there a way for me to use BeautifulSoup to get the text of tags that contain more than one word?

For example if I had HTML:

<div>
    <div>
        <a>hello there</a>
        <a>hi</a>
    </div>
    <a>what's up</a>
    <a>stackoverflow</a>
</div>

…I just want to get:

hello there what's up

2

Answers


  1. You can definitely use BeautifulSoup to extract the text from HTML tags that contain more than one word. In your example, you want to extract the text from tags that have multi-word content. Here’s how you can achieve that using BeautifulSoup in Python.

    from bs4 import BeautifulSoup
    
    html = '''
    <div>
        <div>
            <a>hello there</a>
            <a>hi</a>
        </div>
        <a>what's up</a>
        <a>stackoverflow</a>
    </div>
    '''
    
    soup = BeautifulSoup(html, 'html.parser')
    
    target_tags = soup.find_all('a')  # Find all <a> tags
    multi_word_texts = []
    
    for tag in target_tags:
        if ' ' in tag.get_text():  # Check if the tag text contains a space (indicating multiple words)
            multi_word_texts.append(tag.get_text())
    
    result = ' '.join(multi_word_texts)
    print(result)
    
    Login or Signup to reply.
  2. If you like to use BeautifulSoup you could also use stripped_strings and iterate its result, while checking if there is a whitespace:

    ' '.join(s for s in soup.stripped_strings if ' ' in s )
    

    Alternatively, you can check each tag individually with .get_text(), but I would recommend stripping the results .get_text(strip=True) before checking for a whitespace.

    Example
    from bs4 import BeautifulSoup
    
    html = '''
    <div>
        <div>
            <a>hello there</a>
            <a>hi </a>
        </div>
        <a>what's up</a>
        <a>stackoverflow</a>
    </div>
    '''
    
    soup = BeautifulSoup(html)
    
    ' '.join(s for s in soup.stripped_strings if ' ' in s )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search