skip to Main Content

I am trying to find different tags at once using find_all() method of BeautifulSoup. I found a way to include all tags in the list to get the respective tags. But I am trying to get tags along with their attributes. I am not sure how to get it.

This is the reference HTML structure.

<html>
    <body>
        <div>
            <h4>Registered Customer Details</h4>
            <div>
                <div class='row'>
                <div class='col-3'>Name :</div>
                <div class='col-6'>ABC</div>
            </div>
            <div>
                <div class='row'>
                <div class='col-3'>Address :</div>
                <div class='col-6'>India</div>
            </div>
            <div class="col-3 col-6 col-12 lo">
                <a class="navbar-brand" href="#">
                    <img alt="image" class="img-responsive" src="/uploads/NEWLOGO.png"/>
                </a>
            </div>
            <h4>Partner Details</h4>
            <div>
                <div class='row'>
                <div class='col-4'>Partners :</div>
                <div class='col-8'><table></table></div>
            </div>
            <div class="span3"></div>
            <div class="span9"></div>
        </div>
    </body>
</html>

I’m trying to find tags at once.

from bs4 import BeautifulSoup
soup = BeautifulSoup(open('test.html','r').read(),'lxml')
soup.find_all(['h4','div'])

The above script will return all h4 tags and all div tags, but I’m looking for all h4 tags and all div tags having class value as col-3, col-6, col-4 & col-8.

Which might look something like this,

# for single value
soup.find_all(['h4', ['div',{'class':'col-3'}] ])

# for multiple value
soup.find_all(['h4', ['div',{'class':['col-3','col-6','col-4','col-8']}] ])

Output:

[<h4>Registered Customer Details</h4>, <h4>Partner Details</h4>]

Expected output:

[<h4>Registered Customer Details</h4>,
 <div class='col-3'>Name :</div>,
 <div class='col-6'>ABC</div>,
 <div class='col-3'>Address :</div>,
 <div class='col-6'>India</div>,
 <h4>Partner Details</h4>,
 <div class='col-4'>Partners :</div>,
 <div class='col-8'><table></table>]

2

Answers


  1. Try writing this-

    # for single value
    soup.find_all(['h4','div'], class_='col-3')
    
    # for multiple value
    soup.find_all(['h4','div'], class_=['col-3','col-4'])
    
    Login or Signup to reply.
  2. First of all HTML should be valid to be sure parser is working as expected. There are closing tags missing.

    One option that would work for your example is the following css selector:

    soup.select('h4,div[class^="col-"]')
    

    You could adapt it and make it more specific if needed:

    soup.select('h4, div.row  div[class^="col-"]')
    ...
    
    Example
    from bs4 import BeautifulSoup
    html = '''<html>
        <body>
            <div>
                <h4>Registered Customer Details</h4>
                <div>
                    <div class='row'>
                    <div class='col-3'>Name :</div>
                    <div class='col-6'>ABC</div>
                </div>
                <div>
                    <div class='row'>
                    <div class='col-3'>Address :</div>
                    <div class='col-6'>India</div>
                </div>
                <h4>Partner Details</h4>
                <div>
                    <div class='row'>
                    <div class='col-4'>Partners :</div>
                    <div class='col-8'><table></table></div>
                </div>
                <div class="span3"></div>
                <div class="span9"></div>
            </div>
        </body>
    </html>'''
    soup = BeautifulSoup(html,)
    
    soup.select('h4,div[class^="col-"]')
    
    Output
    [<h4>Registered Customer Details</h4>,
     <div class="col-3">Name :</div>,
     <div class="col-6">ABC</div>,
     <div class="col-3">Address :</div>,
     <div class="col-6">India</div>,
     <h4>Partner Details</h4>,
     <div class="col-4">Partners :</div>,
     <div class="col-8"><table></table></div>]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search