I have this html:
<div>
hello world
<p>
the world is round
<img src="domain.com/world.jpg">
</p>
</div>
And want to replace the word "world" (or mixed case variants thereof) with <span style='color:red;'>BARFOO</span>
but only in <p>
, <div>
and a few other specific elements.
In the following code, it changes the text in the <div>
, but not in the <p>
. A replace operation is done (on something), but does not show up in the browser’s html.
If I just supply p
to querySelectorAll
, then repeat again for <div>
, it works fine.
I am thinking that once the code processes the <div>
and finds that it has a child element(s), when that element(s) is put back into the html string, then the element reference for the <p>
is lost.
jsfiddle is set up here https://jsfiddle.net/limeygent/t5q8ch23/12/ with more debug statements.
Any thoughts on what is happening & how to fix? (js only solution please)
var newspan = "<span style='color:red;'>BOOFAR</span>";
var regExNameSearch = new RegExp('World','gi');
var lc= 'World'.toLowerCase();
const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).filter(
(element) => {
for (let child of element.childNodes) {
if (child.nodeType === Node.TEXT_NODE && child.textContent.toLowerCase().includes(lc)) {
console.log('found ' + child.textContent);
let parent = child.parentNode;
let html = parent.innerHTML;
// Find all the child elements in the element
var excludeElements = parent.querySelectorAll('*');
if (excludeElements.length == 0){
console.log('no child elements');
parent.innerHTML = parent.innerHTML.replace(regExNameSearch, newspan);
// (also tried this) parent.innerHTML = html;
}else{
// Replace the text of each child element with placeholder
excludeElements.forEach(excludeElement => {
console.log('phase 1 - replacing - BEFORE');
html = html.replace(excludeElement.outerHTML, 'FOOBAR');
console.log('phase 1 - replacing - AFTER');
});
html = html.replace(regExNameSearch, newspan);
// Replace the text of each child element back to its original HTML
excludeElements.forEach(excludeElement => {
console.log('phase 2 - replacing - BEFORE:');
html = html.replace('FOOBAR', excludeElement.outerHTML);
console.log('phase 2 - replacing - AFTER:');
});
// Update the element's innerHTML with the updated HTML
parent.innerHTML = html;
}
return true;
}
}
return false;
}
);
edit: if you supply an answer recc. editing the innerHTML, make sure it doesn’t affect any child nodes. The code I present here got super complex because I had to avoid editing anything further inside the node.
Oh, and if you present reccs from chatGPT (while it can be useful), please test what you post first 😉
2
Answers
With the help of a friend, explaining that the nodelist "array" returned by
querySelectorAll
is static, that explains why nodes were being missed or overwritten. The suggestion was to start at the lowest level of the DOM tree, perform theinnerHTML
replacement, then work up the tree.Hat tip to Rob for his explanation:
document.querySelectorAll
returns a static nodelist which is accurate when the function is called but isn't accurate if the document is changed. Using.innerHTML
to make the replacement of "world" deletes and recreates all existing content in the tag including the<p>
tag and its contents the<p>
tag that is now on the page is a completely new one that isn't referenced by the node returned bydocument.querySelectorAll
querySelectorAll
returns an "array" (not quite, but the term is used loosely for purposes of this answer) using the depth-first traversal in pre-order operation. Read more here https://en.wikipedia.org/wiki/Tree_traversal for tree traversal methods.I needed to start at the lowest levels of the node arrays so as to not mangle any references to child nodes.
Here is the change:
(old)
const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).filter(
(new)
const elements = Array.from(document.querySelectorAll('p, span, div, strong, h1, h2, h3, h4')).reverse().filter(
On the sample html code in this question, and some other variations, it works fine. I'll continue to test further.
Comments / pitfalls welcomed.
New fiddle https://jsfiddle.net/9vwo6a3q/
You can use the
TreeWalker
API to achieve the desired results.The essential logic is this:
Iterate text nodes that meet the specified criteria: the text content matches the case-insensitive regular expression pattern and the node is the direct child (or, if desired, a descendant) of an element that matches your selector.
For each matched text node: remove it from its parent, but first split the node’s text content on the regular expression pattern, and for each resulting string:
<span>
node and insert it as well.TS Playground
The TS code above, compiled to plain JavaScript in a runnable snippet: