skip to Main Content

How can I do to surround the content between <h4> and the next <h4> within the main element with <section> tags, when the depth of the <h4> elements varies? In example below, lines 3 to 5, lines 7 to 11, and lines 12 to 14 should be enclosed with <section> tags respectively.

<main class="main">
  <div class="styling-div"><h3>Main Heading 1</h3></div>
  <h4>Subheading</h4>
  <p>Text</p>
  <img src="src">
  <div class="styling-div"><div class="another-div"><h3>Main Heading 2</h3></div></div>
  <h4>Subheading</h4>
  <p>Text</p>
  <div class="styling-div-for-h5"><h5>Sub-subheading</h5></div>
  <p>Text</p>
  <p>Text</p>
  <div class="styling-div-for-h4"><h4>Subheading</h4></div>
  <p>Text</p>
  <figure>Image</figure>
</main>

I have no idea for placing tags responding properly.

2

Answers


  1. What you need to do to wrap parts of main in sections that are started by h4 tags at varying depths is iterate through the children of the main tag.

    By checking if each child either is or contains an h4 tag you know you need to insert a section element before it and insert the child into it. If the child does not meet the condition you just insert it into the section last created or just leave it be if no section has been created yet.

    Login or Signup to reply.
  2. You can try:

    html_text = '''
    <main class="main">
      <div class="styling-div"><h3>Main Heading 1</h3></div>
      <h4>Subheading</h4>
      <p>Text</p>
      <img src="src">
      <div class="styling-div"><div class="another-div"><h3>Main Heading 2</h3></div></div>
      <h4>Subheading</h4>
      <p>Text</p>
      <div class="styling-div-for-h5"><h5>Sub-subheading</h5></div>
      <p>Text</p>
      <p>Text</p>
      <div class="styling-div-for-h4"><h4>Subheading</h4></div>
      <p>Text</p>
      <figure>Image</figure>
    </main>'''
    
    soup = BeautifulSoup(html_text, 'html.parser')
    
    def is_h4_or_contains_h4(tag):
        return tag.name == 'h4' or tag.find('h4')
    
    def is_h1234_or_contains_h1234(tag):
        tags = {'h1', 'h2', 'h3', 'h4'}
        return tag.name in tags or tag.find(tags)
    
    for tag_in_main in soup.select('.main > *'):
        if is_h4_or_contains_h4(tag_in_main):
    
            s = 'n<section>n'
            for sibling in tag_in_main.find_next_siblings():
                if is_h1234_or_contains_h1234(sibling):
                    break
                s += str(sibling) + 'n'
                sibling.extract()
            s += '</section>n'
            tag_in_main.insert_after(BeautifulSoup(s, 'html.parser'))
    
    # clear empty lines
    soup.smooth()
    main = soup.select_one('.main')
    for i, c in enumerate(main.contents):
        if isinstance(c, NavigableString):
            main.contents[i].replace_with('n')
    
    print(soup)
    

    Prints:

    <main class="main">
    <div class="styling-div"><h3>Main Heading 1</h3></div>
    <h4>Subheading</h4>
    <section>
    <p>Text</p>
    <img src="src"/>
    </section>
    <div class="styling-div"><div class="another-div"><h3>Main Heading 2</h3></div></div>
    <h4>Subheading</h4>
    <section>
    <p>Text</p>
    <div class="styling-div-for-h5"><h5>Sub-subheading</h5></div>
    <p>Text</p>
    <p>Text</p>
    </section>
    <div class="styling-div-for-h4"><h4>Subheading</h4></div>
    <section>
    <p>Text</p>
    <figure>Image</figure>
    </section>
    </main>
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search