Html - JSOUP find identical tag elements that are not separated by other tags

JakeMorgan
August 17, 2023
198 views
0 votes
2 Answers

Code example:

<body>
    <p>123124</p>
    <p>214565</p>
    <p>123333</p>
    <img src = "aboba"/>
    <p>faaaa</p>
    <p>aaaaa</p>
    <a href="src"></a>
    <p>1111</p>
</body>

Using jsoup to wrap in <div class="test"></div>

I expect that it will turn out something like that

Result:

<div class="test">
    <p>123124</p>
    <p>214565</p>
    <p>123333</p>
</div>
<img src = "aboba"/>
<div class="test">
    <p>faaaa</p>
    <p>aaaaa</p>
</div>
<a href="src"></a>
<div class="test">
    <p>1111</p>
</div>

UPDATE: correction of the expected result

Answers

To find identical HTML tag elements that are not separated by other tags using Jsoup, you can follow these steps:

Parse the HTML document using Jsoup.
Use the select method to find the specific tag elements you’re interested in.
Iterate through the found elements and check if they are adjacent to each other in the document’s structure.

Here’s a code example in Java that demonstrates this process:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupIdenticalTags {
    public static void main(String[] args) {
        String html = "<div><p>Text 1</p><p>Text 2</p><p>Text 3</p><p>Text 4</p></div>";

        // Parse the HTML document
        Document document = Jsoup.parse(html);

        // Specify the tag name you're interested in (e.g., "p" for <p> tags)
        String tagName = "p";

        // Find all elements with the specified tag name
        Elements elements = document.select(tagName);

        // Iterate through the elements and check for adjacency
        for (int i = 1; i < elements.size(); i++) {
            Element prevElement = elements.get(i - 1);
            Element currentElement = elements.get(i);

            // Check if the elements are siblings (adjacent)
            if (prevElement.siblingIndex() + 1 == currentElement.siblingIndex()) {
                System.out.println("Identical elements found:");
                System.out.println(prevElement);
                System.out.println(currentElement);
                System.out.println("-------");
            }
        }
    }
}

The HTML contains a <div> with multiple <p> elements. The code iterates through the <p> elements and checks if they are siblings (adjacent) to each other. If they are, it prints out the identical adjacent elements.

You can adjust the tagName variable to the specific tag name you’re interested in (e.g., "p" for <p> tags).

"… find identical tag elements that are not separated by other tags …"

You can use a ListIterator to mutate an Elements collection.
Additionally, you can utilize the next and previous methods to adjust the iterator’s cursor.

Here, the element and next variables hold the current and following elements, respectively.

The concept is to add all p elements to the new div element.
The div is only added when the next element is null or not a p element.

String string
  = "<body>n" +
    "    <p>123124</p>n" +
    "    <p>214565</p>n" +
    "    <p>123333</p>n" +
    "    <img src = "aboba"/>n" +
    "    <p>faaaa</p>n" +
    "    <p>aaaaa</p>n" +
    "    <a href="src"></a>n" +
    "    <p>1111</p>n" +
    "</body>";
Document document = Jsoup.parse(string);
Elements body = document.select("body").first().children();
ListIterator<Element> iterator = body.listIterator();
Element element, next, div = new Element("div").attr("class", "test");
while (iterator.hasNext()) {
    element = iterator.next();
    if (element.tagName().equalsIgnoreCase("p")) {
        div.appendChild(element);
        iterator.remove();
    }
    if (iterator.hasNext()) {
        next = iterator.next();
        iterator.previous();
        if (!next.tagName().equalsIgnoreCase("p")) {
            iterator.add(div);
            div = new Element("div").attr("class", "test");
        }
    } else {
        iterator.add(div);
    }
}

Here is the output for body.

<div class="test">
 <p>123124</p>
 <p>214565</p>
 <p>123333</p>
</div>
<img src="aboba">
<div class="test">
 <p>faaaa</p>
 <p>aaaaa</p>
</div>
<a href="src"></a>
<div class="test">
 <p>1111</p>
</div>

Please signup or login to give your own answer.

Click here to cancel reply.

Html – JSOUP find identical tag elements that are not separated by other tags

Answers