JS: replacing all occurrences of a word in html with <span> element ONLY for p, span & divs. Not working if parent node contains the word

limeygent
February 15, 2023
183 views
0 votes
2 Answers

I have this html:

<div>
hello world
<p>
the world is round
<img src="domain.com/world.jpg">
</p>
</div>

And want to replace the word "world" (or mixed case variants thereof) with <span style='color:red;'>BARFOO</span> but only in <p>, <div> and a few other specific elements.

In the following code, it changes the text in the <div>, but not in the <p>. A replace operation is done (on something), but does not show up in the browser’s html.

If I just supply p to querySelectorAll, then repeat again for <div>, it works fine.

I am thinking that once the code processes the <div> and finds that it has a child element(s), when that element(s) is put back into the html string, then the element reference for the <p> is lost.

jsfiddle is set up here https://jsfiddle.net/limeygent/t5q8ch23/12/ with more debug statements.

Any thoughts on what is happening & how to fix? (js only solution please)

var newspan = "<span style='color:red;'>BOOFAR</span>";

var regExNameSearch = new RegExp('World','gi');
var lc= 'World'.toLowerCase();

const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).filter(
          (element) => {
            for (let child of element.childNodes) {
              if (child.nodeType === Node.TEXT_NODE && child.textContent.toLowerCase().includes(lc)) {
                console.log('found ' + child.textContent);
                let parent = child.parentNode;
                let html = parent.innerHTML;

                // Find all the child elements in the element
                var excludeElements = parent.querySelectorAll('*');

                if (excludeElements.length == 0){
                    console.log('no child elements');
                    parent.innerHTML = parent.innerHTML.replace(regExNameSearch, newspan);
                    // (also tried this) parent.innerHTML = html;
                }else{

                    // Replace the text of each child element with placeholder
                    excludeElements.forEach(excludeElement => {
                        console.log('phase 1 - replacing - BEFORE');
                        html = html.replace(excludeElement.outerHTML, 'FOOBAR');
                        console.log('phase 1 - replacing - AFTER');
                    });
                    html = html.replace(regExNameSearch, newspan);

                    // Replace the text of each child element back to its original HTML
                    excludeElements.forEach(excludeElement => {
                        console.log('phase 2 - replacing - BEFORE:');
                        html = html.replace('FOOBAR', excludeElement.outerHTML);
                        console.log('phase 2 - replacing - AFTER:');
                    });

                    // Update the element's innerHTML with the updated HTML
                    parent.innerHTML = html;
                    
                }
                  return true;
              }
            }
            return false;
          }
        );

edit: if you supply an answer recc. editing the innerHTML, make sure it doesn’t affect any child nodes. The code I present here got super complex because I had to avoid editing anything further inside the node.
Oh, and if you present reccs from chatGPT (while it can be useful), please test what you post first 😉

Tags: html javascript

Answers

Chosen as BEST ANSWER
- limeygent
- February 16, 2023 at 9:12 pm
- 0 votes
0
With the help of a friend, explaining that the nodelist "array" returned by querySelectorAll is static, that explains why nodes were being missed or overwritten. The suggestion was to start at the lowest level of the DOM tree, perform the innerHTML replacement, then work up the tree.

Hat tip to Rob for his explanation: document.querySelectorAll returns a static nodelist which is accurate when the function is called but isn't accurate if the document is changed. Using .innerHTML to make the replacement of "world" deletes and recreates all existing content in the tag including the <p> tag and its contents the <p> tag that is now on the page is a completely new one that isn't referenced by the node returned by document.querySelectorAll

querySelectorAll returns an "array" (not quite, but the term is used loosely for purposes of this answer) using the depth-first traversal in pre-order operation. Read more here https://en.wikipedia.org/wiki/Tree_traversal for tree traversal methods.

I needed to start at the lowest levels of the node arrays so as to not mangle any references to child nodes.

Here is the change:

(old)

const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).filter(

(new)

const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).reverse().filter(

On the sample html code in this question, and some other variations, it works fine. I'll continue to test further.

Comments / pitfalls welcomed.

New fiddle https://jsfiddle.net/9vwo6a3q/

(Edit)

You can use the TreeWalker API to achieve the desired results.

The essential logic is this:

Iterate text nodes that meet the specified criteria: the text content matches the case-insensitive regular expression pattern and the node is the direct child (or, if desired, a descendant) of an element that matches your selector.

For each matched text node: remove it from its parent, but first split the node’s text content on the regular expression pattern, and for each resulting string:

If it is non-empty, re-insert it into the parent node (just before the matched node) as a new text node. Before each string (except the first): create a copy of your substitute <span> node and insert it as well.

TS Playground

function assert (expr: unknown, msg?: string): asserts expr {
  if (!expr) throw new Error(msg);
}

function createTextNodeFilterFn (regexp: RegExp, ancestorSelector: string): (textNode: Text) => number {
  return ((textNode: Text): number => {
    if (!(
      textNode.textContent
      && regexp.test(textNode.textContent)
    )) return NodeFilter.FILTER_REJECT;

    // To find any matching ancestor (not just the direct parent):
    // const valid = Boolean(textNode.parentElement?.closest(ancestorSelector));
    const valid = textNode.parentElement?.matches(ancestorSelector);
    if (valid) return NodeFilter.FILTER_ACCEPT;

    return NodeFilter.FILTER_REJECT;
  });
}

function createSubstituteNode (): HTMLSpanElement {
  const span = document.createElement("span");
  span.textContent = "BARFOO";
  span.style.setProperty("color", "red");
  return span;
}

function transformTextNode (node: Node, regexp: RegExp): void {
  const {parentNode, textContent} = node;
  assert(parentNode, "Parent node not found");
  assert(textContent, "Text content not found");

  const iter = textContent.split(regexp)[Symbol.iterator]();

  const firstResult = iter.next();
  if (firstResult.done) return;
  if (firstResult.value.length > 0) {
    parentNode.insertBefore(new Text(firstResult.value), node);
  }

  for (const str of iter) {
    parentNode.insertBefore(createSubstituteNode(), node);
    if (str.length === 0) continue;
    parentNode.insertBefore(new Text(str), node);
  }

  parentNode.removeChild(node);
}

function main () {
  const TARGET_REGEXP = /world/i;
  const TARGET_SELECTOR = "div, h1, h2, h3, h4, p, span, strong";

  const tw = document.createTreeWalker(
    document.body,
    NodeFilter.SHOW_TEXT,
    {acceptNode: createTextNodeFilterFn(TARGET_REGEXP, TARGET_SELECTOR)},
  );

  let node = tw.nextNode();

  while (node) {
    // Advance the TreeWalker's iterator state before mutating the current node:
    const memo = node;
    node = tw.nextNode();
    transformTextNode(memo, TARGET_REGEXP);
  }
}

main();

The TS code above, compiled to plain JavaScript in a runnable snippet:

"use strict";
function assert(expr, msg) {
    if (!expr)
        throw new Error(msg);
}
function createTextNodeFilterFn(regexp, ancestorSelector) {
    return ((textNode) => {
        if (!(textNode.textContent
            && regexp.test(textNode.textContent)))
            return NodeFilter.FILTER_REJECT;
        // To find any matching ancestor (not just the direct parent):
        // const valid = Boolean(textNode.parentElement?.closest(ancestorSelector));
        const valid = textNode.parentElement?.matches(ancestorSelector);
        if (valid)
            return NodeFilter.FILTER_ACCEPT;
        return NodeFilter.FILTER_REJECT;
    });
}
function createSubstituteNode() {
    const span = document.createElement("span");
    span.textContent = "BARFOO";
    span.style.setProperty("color", "red");
    return span;
}
function transformTextNode(node, regexp) {
    const { parentNode, textContent } = node;
    assert(parentNode, "Parent node not found");
    assert(textContent, "Text content not found");
    const iter = textContent.split(regexp)[Symbol.iterator]();
    const firstResult = iter.next();
    if (firstResult.done)
        return;
    if (firstResult.value.length > 0) {
        parentNode.insertBefore(new Text(firstResult.value), node);
    }
    for (const str of iter) {
        parentNode.insertBefore(createSubstituteNode(), node);
        if (str.length === 0)
            continue;
        parentNode.insertBefore(new Text(str), node);
    }
    parentNode.removeChild(node);
}
function main() {
    const TARGET_REGEXP = /world/i;
    const TARGET_SELECTOR = "div, h1, h2, h3, h4, p, span, strong";
    const tw = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT, { acceptNode: createTextNodeFilterFn(TARGET_REGEXP, TARGET_SELECTOR) });
    let node = tw.nextNode();
    while (node) {
        // Advance the TreeWalker's iterator state before mutating the current node:
        const memo = node;
        node = tw.nextNode();
        transformTextNode(memo, TARGET_REGEXP);
    }
}
main();

<div>
  hello world
  <p>
    the world is round
    <img src="domain.com/world.jpg">
  </p>
</div>

Please signup or login to give your own answer.

Click here to cancel reply.