I have collected all the head tags in the data given by
heads=str(soup.find_all(re.compile('^h[1-6]$')))
. Then i am collecting data in between the head tags. A portion of source code is given.
import bs4
import re
data = '''
<html>
<body>
<div class="mob-icon"> <span></span></div>
<nav id="nav">
<ul class="" id="menu-home-welcome-banner">
<li class="menu-item menu-item-type-custom menu-item-object-custom current-menu-parent menu-item-has-children menu-item-1778" id="menu-item-1778"> <a class="submeny-top" href="http://www.uvionicstech.com" ontouchstart="">Home</a> </li>
<!--<li id="menu-item-1785" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1785"><a href="#about" class="scroll-to-link" ontouchstart="">About</a></li>-->
<li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1786" id="menu-item-1786"><a class="scroll-to-link" href="#data-analytics" ontouchstart="">PRODUCTS & SOLUTIONS</a></li>
<li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1787" id="menu-item-1787"><a class="scroll-to-link" href="#artificial-intelligence" ontouchstart="">Artificial Intelligence</a></li>
<!-- <li id="menu-item-1788" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1788"><a href="#iot" class="scroll-to-link" ontouchstart="">IOT</a></li> -->
<li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1788" id="menu-item-1788"><a class="scroll-to-link" href="#services" ontouchstart="">All in One Place</a></li>
<li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1789" id="menu-item-1789"><a class="scroll-to-link" href="#eco-system" ontouchstart="">PARTNERS</a></li>
<li class="menu-item menu-item-type-custom menu-item-object-custom menu-item-1791" id="menu-item-1791"><a class="scroll-to-link" href="#contact" ontouchstart="">Contact</a></li>
<h3 class="h3 text-center">PARTNERS</h3>
<h3 class="vc_custom_heading titel-left wow" data-wow-delay="0.3s">
<span class="titel-line"></span>Artificial Intelligence </h3>
<h3 class="vc_custom_heading titel-left wow " data-wow-delay="0.3s"><span class="titel-line">
</span>Everything for your Business, <small>all in one place</small>
</h3>
</ul>
</nav>
</div>
</body>
</html>
'''
searched_word = 'Artificial Intelligence'
soup = bs4.BeautifulSoup(data, 'html.parser')
results = soup.body.find_all(string=re.compile('.*{0}.*'.format(searched_word)), recursive=True)
output:
results
['Artificial Intelligence',
'Artificial Intelligence ']
Here the first Artificial Intelligence
is of list item and second Artificial Intelligence
is of head tag. I am trying to find out the word only with head tag. How to get the word only has head tag? Is there any way to find the next few characters followed by the word Artificial Intelligence
. So that it will get Artificial Intelligence </h3>
. Then it will not consider the list item.
2
Answers
since it’s only the head tag you want, could we just grab those, then search through those?
gave output:
if there are no child tag in the headings like
you can combine your regex
but your
h3
contain child tag, I will create loop like first answer or create custom function to pass tofind_all()
results: