skip to Main Content

I am trying to extract direct text in a given HTML tag. Simply, for <p> Hello! </p>, the direct text is Hello!. The code works well except with the case below.

from bs4 import BeautifulSoup
soup = BeautifulSoup('<div> <i> </i> FF Services </div>', "html.parser")
for tag in soup.find_all():
    direct_text = tag.find(string=True, recursive=False)
    print(tag, ':', direct_text)

Output:

`<div> <i> </i> FF Services </div> :  `
`<i> </i> :  `

The first printed output should be <div> <i> </i> FF Services </div> : FF Services , but it skips FF Services. I found that when I delete <i> </i> the code works fine.

What’s the problem here?

2

Answers


  1. Using .find_all instead of .find will give the desired output. Try this code.

    for tag in soup.find_all():
        direct_text = tag.find_all(string=True, recursive=False)
        print(tag, ':', direct_text)
    
    Login or Signup to reply.
  2. The issue is not of BeautifulSoup methods. Also your code works… you just fall in your own trap!^1 div.find(string=True) get the 1st match of a node containing a string and when it parses<div> <i>... there is for sure <i> but before it there is also a NavigableString which consists of a single white space. This means that in your code there is actually a single white space printed. Here a test:

    print(f"{len(soup.div.find(string=True)) = }") # using same soup as in the question
    #len(soup.div.find(string=True)) = 1
    

    It is helpful to look at the tag’s content:

    for tag in soup.div.contents:
        print(f"-{tag}-", type(tag))
    
    #- - <class 'bs4.element.NavigableString'>
    #-<i> </i>- <class 'bs4.element.Tag'>
    #- FF Services - <class 'bs4.element.NavigableString'>
    

    Be aware that, from the doc,:
    "If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None:". To bypass this either use better navigation instructions if possible or use strings, stripped_strings together with some parsing.


    ^1 it happened also to me smt similar

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search