skip to Main Content

Inside my string there are different tags that I have to match with a regex and then wrap with some extra tags I have to use.

Let’s say I have this string:

Lorem ipsum <key="right_outline"> sed [heal] dolor {sit} amet, {0:%s} consectetur {BBBw}
{/BBBw} adipiscing <color=#CC294B></color> elit. Sed [copd][cc] lobortis mauris. 

So I need a regex to match everything that is between these tags as <…> and {…} and <…>.

Also I have to wrap every single combination with <my-tag></my-tag>
as in the example here:

Lorem ipsum <my-tag><key="right_outline"></my-tag> sed <my-tag>[heal]</my-tag> 
dolor <my-tag>{sit}</my-tag> amet, <my-tag>{0:%s}</my-tag> consectetur 
<my-tag>{BBBw}{/BBBw}</my-tag> adipiscing <my-tag><color=#CC294B></color></my-tag> elit. 
Sed <my-tag>[copd][cc]</my-tag> lobortis mauris. 

I am new to the regex so having some test with this:

let regex_tags = new RegExp("[.*?]|{.*?}|<.*?>"); 

And for the Javascript part about putting the wrapping tags:

a = 'my string with all the weird tags on it';
r = new RegExp("[.*?]|{.*?}|<.*?>");
b = a.replace(r, `<my-tag>$&</my-tag>`);

I use the $& into the .replace() method to let the matches be preserved and not modified, just wrapped.

Questions:

  1. Do any of you know a regex to achieve this?
  2. Is the .replace() method the best solutions?

Thanks a lot for the help

2

Answers


  1. Here is an example if you want to retrieve all html tags

    But: (important) if you want to modify/validate or do any complex task with that html content, I would recommend to you to transform that html string into actually html nodes to work with, as is showed in the example code.
    There it is a example to wrap my tag into wrapper div

    Happy coding! 😀

    const htmlText = 'Lorem ipsum <my-tag><key="right_outline"></my-tag> sed <my-tag>[heal]</my-tag> dolor <my-tag>{sit}</my-tag> amet, <my-tag>{0:%s}</my-tag> consectetur <my-tag>{BBBw}{/BBBw}</my-tag> adipiscing <my-tag><color=#CC294B></color></my-tag> elit. Sed <my-tag>[copd][cc]</my-tag> lobortis mauris. Lorem ipsum <key="right_outline"> sed [heal] dolor {sit} amet, {0:%s} consectetur {BBBw}{/BBBw} adipiscing <color=#CC294B></color> elit. Sed [copd][cc] lobortis mauris.';
    
    const tagsRegex = /</?[a-z][^>]*>/gi;
    
    /** get tags */
    const tags = [...htmlText.matchAll(tagsRegex)];
    document.querySelector(".tags").innerText = 
    JSON.stringify(tags.flat());
    
    // transforms the text into html nodes and wrap my tag with wrapperdiv
    const div = document.createElement("div");
    div.innerHTML = htmlText;
    const myTagElement = div.querySelector("my-tag");
    
    const wrapperDiv = document.createElement("div");
    wrapperDiv.className = "wrapper-div";
    div.insertBefore(wrapperDiv, myTagElement);
    wrapperDiv.appendChild(myTagElement);
    
    document.querySelector(".edited-html").innerText = div.innerHTML;
    <div class="tags"></div>
        <div>
            ------
        </div>
    <div class="edited-html"></div>
    Login or Signup to reply.
  2. You’re very nearly there:

    • you want to be a bit more aggressive with your escaping of backslashes
    • you’ll need to specify global matching (otherwise the pattern will only match the first instance)
    • if I understand your syntax correctly, you just want to collapse adjacent "my-tag" instances, which you can do by putting your pattern in a capturing group and adding a +:
    r = new RegExp("(\[.*?\]|\{.*?\}|<.*?>)+", "g");
    

    Output:

    Lorem ipsum <my-tag><key="right_outline"></my-tag> sed <my-tag>[heal]</my-tag>
    dolor <my-tag>{sit}</my-tag> amet, <my-tag>{0:%s}</my-tag> consectetur
    <my-tag>{BBBw}{/BBBw}</my-tag> adipiscing <my-tag><color=#CC294B></color></my-tag> elit.
    Sed <my-tag>[copd][cc]</my-tag> lobortis mauris. 
    

    It gets somewhat more complicated if you were to introduce extra constraints on matching opening and closing tags etc but if you’re just hoping to tag your own tags, as it were, then this appears to do what you wanted.

    @sln has pointed out that there may be some whitespace suppression going on – I’m not clear whether that newline between your {BBBw} and {/BBBw} is intentional, because it’s not reflected in your desired output. This answer ignores it, perhaps you can clarify?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search