I am trying to find different tags at once using find_all()
method of BeautifulSoup. I found a way to include all tags in the list to get the respective tags. But I am trying to get tags along with their attributes. I am not sure how to get it.
This is the reference HTML structure.
<html>
<body>
<div>
<h4>Registered Customer Details</h4>
<div>
<div class='row'>
<div class='col-3'>Name :</div>
<div class='col-6'>ABC</div>
</div>
<div>
<div class='row'>
<div class='col-3'>Address :</div>
<div class='col-6'>India</div>
</div>
<div class="col-3 col-6 col-12 lo">
<a class="navbar-brand" href="#">
<img alt="image" class="img-responsive" src="/uploads/NEWLOGO.png"/>
</a>
</div>
<h4>Partner Details</h4>
<div>
<div class='row'>
<div class='col-4'>Partners :</div>
<div class='col-8'><table></table></div>
</div>
<div class="span3"></div>
<div class="span9"></div>
</div>
</body>
</html>
I’m trying to find tags at once.
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('test.html','r').read(),'lxml')
soup.find_all(['h4','div'])
The above script will return all h4 tags and all div tags, but I’m looking for all h4 tags and all div tags having class value as col-3, col-6, col-4 & col-8.
Which might look something like this,
# for single value
soup.find_all(['h4', ['div',{'class':'col-3'}] ])
# for multiple value
soup.find_all(['h4', ['div',{'class':['col-3','col-6','col-4','col-8']}] ])
Output:
[<h4>Registered Customer Details</h4>, <h4>Partner Details</h4>]
Expected output:
[<h4>Registered Customer Details</h4>,
<div class='col-3'>Name :</div>,
<div class='col-6'>ABC</div>,
<div class='col-3'>Address :</div>,
<div class='col-6'>India</div>,
<h4>Partner Details</h4>,
<div class='col-4'>Partners :</div>,
<div class='col-8'><table></table>]
2
Answers
Try writing this-
First of all HTML should be valid to be sure parser is working as expected. There are closing tags missing.
One option that would work for your example is the following
css selector
:You could adapt it and make it more specific if needed:
Example
Output