I have a string with bbcode in it: "[area=A]A[/area] very, [area=good]good[/area] string."
I need to add bbcode to any word that doesn’t already have it. For example, the string above should look like:
"[area=A]A[/area] [area=very]very[/area], [area=good]good[/area] [area=string]string[/area]."
It also needs to work with accent marks / diacritics (e.g., aquí).
String "[area=A]A[/area] very, [area=good]good[/area] string."
outputs the following as expected:
[area=A]A[/area] [area=very]very[/area], [area=good]good[/area] [area=string]string[/area].
However, string "A good string. [area=A]A[/area] very, [area=good]good[/area] string."
outputs:
[area=A]A[/area] [area=good]good[/area] [area=[area=string]string[/area]]string[/area]. [area=[area=A]A[/area]]A[/area] [area=very]very[/area], [area=[area=good]good[/area]]good[/area] [area=[area=string]string[/area]]string[/area].
I believe this is because substring "a"
is causing problems.
Question: I’ve tried various approaches with regex and boundaries. I think the best solution is to have a regex that ignore anything inside [area] word here [/area]
tags, as well as the word that’s part of the tag: [area=word here]
. How can I solve this?
Code:
var text = `[area=A]A[/area] very, [area=good]good[/area] string.`
var text2 = text
text.split(/[,s.¡!¿?]+/)
.filter((v) => {
if (!v.includes('[area=')) {
return v
}
})
.forEach((v) => {
let newFormat = `[area=${v}]${v}[/area]`
let regEx = new RegExp('(^|\W)' + v + '(\W|$)', 'gi') // Custom word boundary
text2 = text2.replace(regEx, function ($0, $1, $2) {
return $1 + newFormat + $2
})
})
console.log(text2)
2
Answers
You need the best regex trick and the
u
flag:The first group will be matched, and if it is matched, we just need to put it back to the string. Otherwise, we may format the match however we please:
Try it:
With what you want, looks like you should ignore touching words that are either surrounded or being touched by either side of the word with brackets.
Try checking this regex and see if it works for you?
Match this regex and replace it with
[area=$1]$1[/area]
Here,
b(w+)b
part avoids partial match of word and(?<![[]])
and(?![[]])
part rejects the words that are either preceded or followed by either[
or]
Demo
Let me know if this works or provide more samples and I can update the regex to cover that. And feel free to change
w
to the char set you prefer.