skip to Main Content

I have a string with bbcode in it: "[area=A]A[/area] very, [area=good]good[/area] string."

I need to add bbcode to any word that doesn’t already have it. For example, the string above should look like:

"[area=A]A[/area] [area=very]very[/area], [area=good]good[/area] [area=string]string[/area]."

It also needs to work with accent marks / diacritics (e.g., aquí).


String "[area=A]A[/area] very, [area=good]good[/area] string." outputs the following as expected:

[area=A]A[/area] [area=very]very[/area], [area=good]good[/area] [area=string]string[/area].

However, string "A good string. [area=A]A[/area] very, [area=good]good[/area] string." outputs:

[area=A]A[/area] [area=good]good[/area] [area=[area=string]string[/area]]string[/area]. [area=[area=A]A[/area]]A[/area] [area=very]very[/area], [area=[area=good]good[/area]]good[/area] [area=[area=string]string[/area]]string[/area].

I believe this is because substring "a" is causing problems.

Question: I’ve tried various approaches with regex and boundaries. I think the best solution is to have a regex that ignore anything inside [area] word here [/area] tags, as well as the word that’s part of the tag: [area=word here]. How can I solve this?


Code:

var text = `[area=A]A[/area] very, [area=good]good[/area] string.`

var text2 = text

text.split(/[,s.¡!¿?]+/)
    .filter((v) => {
        if (!v.includes('[area=')) {
            return v
        }
    })
    .forEach((v) => {
        let newFormat = `[area=${v}]${v}[/area]`
        let regEx = new RegExp('(^|\W)' + v + '(\W|$)', 'gi') // Custom word boundary
        text2 = text2.replace(regEx, function ($0, $1, $2) {
            return $1 + newFormat + $2
        })
    })

console.log(text2)

2

Answers


  1. You need the best regex trick and the u flag:

    /
      (             # 
        [area=     # Match '[area=',
        (p{L}+)    # a capturing group consisting of 1+ letters (any language)
        ]          # and ']',
        2          # followed by the word we just captured, then
        [/area]  # '[/area]'
      )             # 
      |             # or
      p{L}+        # a word.
    /gu
    

    The first group will be matched, and if it is matched, we just need to put it back to the string. Otherwise, we may format the match however we please:

    string.replace(
       regex,
       ($0, $1) => $1 ? $1 : `[area=${$0}]${$0}[/area]`
    )
    

    Try it:

    console.config({ maximize: true });
    
    const regex = /([area=(p{L}+)]2[/area])|p{L}+/gu;
    
    const string = `
    [area=A]A[/area] very, [area=good]good[/area] string aquí.
    A good string. [area=A]A[/area] very, [area=good]good[/area] string.
    `;
    
    console.log(
      string.replace(
        regex,
        ($0, $1) => $1 ? $1 : `[area=${$0}]${$0}[/area]`
      )
    );
    <script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
    Login or Signup to reply.
  2. With what you want, looks like you should ignore touching words that are either surrounded or being touched by either side of the word with brackets.

    Try checking this regex and see if it works for you?

    (?<![[]])b(w+)b(?![[]])
    

    Match this regex and replace it with [area=$1]$1[/area]

    Here, b(w+)b part avoids partial match of word and (?<![[]]) and (?![[]]) part rejects the words that are either preceded or followed by either [ or ]

    Demo

    Let me know if this works or provide more samples and I can update the regex to cover that. And feel free to change w to the char set you prefer.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search