skip to Main Content

I have string of nested html comment level look like this structure <!-- some text <!-- some text --> --> . I want to remove out every comment tag that is inside another comment, so that the final result is <!-- some text some text --> .

This is a regx pattern that I used but it does not match the string comment like I want

<!--[Ss]+?(<!--[Ss]+?-->)[Ss]+?-->

which is used to match this string pattern:

<!-- {% set Navbar = request.Navbar if request.Navbar else "True" %} -->
<!-- {% if Navbar == "True" %}
<div class="navbar navbar-default navbar-static-top">
    <form method="GET" id="frm" name="frm" role="form">
        <div class="col-sm-8">
            <div style="margin-top: 8px;">
                <div class="col-sm-3" style="padding-left: 0;">
                    <input name="From" id="From" value="L___ From R___" class="form-control" placeholder="Effective From *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
                </div>
                <div class="col-sm-3" style="padding-left: 0;">
                    <input name="To" id="To" value="L___ To R___" class="form-control" placeholder="Effective To *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
                </div>
                <div class="col-sm-2" style="padding-left: 0;">
                    <select name="Active" class="form-control" style="width: 100%;">
                        <!-- {% if Active == 'Y' %}
                            <option selected value="Y">Active (Yes)</option>
                            <option value="N">Active (No)</option>
                        {% else %}
                            <option value="Y">Active (Yes)</option>
                            <option selected value="N">Active (No)</option>
                        {% endif %} -->
                    </select>
                </div>

                <!-- {% set CurrencyList = getRecord('MKT_CURRENCY') %} -->
                <!-- {% if len(CurrencyList) > 0 %}
                    <div class="col-sm-2" style="padding-left: 0;">
                        <select name="FilterCurrency" class="form-control" style="width: 100%;">
                        <!-- {% for CurrObj in CurrencyList %}
                            <option L___'selected' if FilterCurrency == CurrObj.ID else ''R___  value="L___ CurrObj.ID R___">L___ CurrObj.ID R___</option>
                        {% endfor %} -->
                        </select>
                    </div>
                {% endif %} -->
                <label class="col-sm-1" style="padding:0px;">
                    <button type="submit" class="btn btn-flat btn-labeled btn-primary" style="padding: 7px 25px;">
                        Show
                    </button>
                </label>
            </div>
        </div>
        <div class="col-sm-4" style="margin-top: 8px;">
            <div class="pull-right">
                <a id="print" onclick="" class="btn btn-flat btn-labeled btn-success">
                    <span class="btn-label icon fa fa-print"></span> Print
                </a>
            </div>
        </div>
    </form>
</div>
{% endif %} -->

And the result after removing nested comment should be:

<!-- {% set Navbar = request.Navbar if request.Navbar else "True" %} -->
<!-- {% if Navbar == "True" %}
<div class="navbar navbar-default navbar-static-top">
    <form method="GET" id="frm" name="frm" role="form">
        <div class="col-sm-8">
            <div style="margin-top: 8px;">
                <div class="col-sm-3" style="padding-left: 0;">
                    <input name="From" id="From" value="L___ From R___" class="form-control" placeholder="Effective From *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
                </div>
                <div class="col-sm-3" style="padding-left: 0;">
                    <input name="To" id="To" value="L___ To R___" class="form-control" placeholder="Effective To *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
                </div>
                <div class="col-sm-2" style="padding-left: 0;">
                    <select name="Active" class="form-control" style="width: 100%;">
                        {% if Active == 'Y' %}
                            <option selected value="Y">Active (Yes)</option>
                            <option value="N">Active (No)</option>
                        {% else %}
                            <option value="Y">Active (Yes)</option>
                            <option selected value="N">Active (No)</option>
                        {% endif %}
                    </select>
                </div>

                {% set CurrencyList = getRecord('MKT_CURRENCY') %}
                {% if len(CurrencyList) > 0 %}
                    <div class="col-sm-2" style="padding-left: 0;">
                        <select name="FilterCurrency" class="form-control" style="width: 100%;">
                        {% for CurrObj in CurrencyList %}
                            <option L___'selected' if FilterCurrency == CurrObj.ID else ''R___  value="L___ CurrObj.ID R___">L___ CurrObj.ID R___</option>
                        {% endfor %} 
                        </select>
                    </div>
                {% endif %}
                <label class="col-sm-1" style="padding:0px;">
                    <button type="submit" class="btn btn-flat btn-labeled btn-primary" style="padding: 7px 25px;">
                        Show
                    </button>
                </label>
            </div>
        </div>
        <div class="col-sm-4" style="margin-top: 8px;">
            <div class="pull-right">
                <a id="print" onclick="" class="btn btn-flat btn-labeled btn-success">
                    <span class="btn-label icon fa fa-print"></span> Print
                </a>
            </div>
        </div>
    </form>
</div>
{% endif %} -->

This what I tried on regx101.

Please kindly help me to remove any nested html comment from string above. Thank you.

3

Answers


  1. Assuming your problem statement could be described as removing the inner HTML comments only, you could use the following regex, which uses a tempered dot trick:

    Find:    <!--(?:(?!<!--).)*?-->
    Replace: (empty)
    

    Demo

    Explanation:

    • <!-- match leading <!--
    • (?:(?!<!--).)*? match any content WITHOUT crossing another <!--
    • --> then match the nearest closing -->
    Login or Signup to reply.
  2. I doubt there’s an elegant solution with regex:

    // "template" contains the original string
    
    let parts = template.split(/(<!--|-->)/);
    
    let result = '';
    let state = 0;
    
    for (const part of parts) {
        switch(part) {
            case '<!--': 
                if (state === 0) {
                    result += part;
                }
                state++;
                break;
                
            case '-->':
                if (state === 1) {
                    result += part;
                }
                state--;
                break;
                
            default:
                result += part;
        }
    }            
            
    console.log(result);
    
    Login or Signup to reply.
  3. This isn’t possible using plain JavaScript regexp functionality, and I think Tim Biegeleisen’s answer already comes as close as it gets, though it’ll remove any markup comment not just those "nested" ones. For a regexp to target only nested ones, and moreover to only target the comment delimiters rather than the entire comment, functionality such as provided by traditional Unix sed (or ed/vi) would be required, allowing you to match a regexp within a larger regexp match.

    Btw. your "nested" commenting syntax is invalid in HTML according to The WHATWG HTML spec chapter 13.1.6 paragraph 2:

    Optionally, text, with the additional restriction that the text must not start with the string ">", nor start with the string "->", nor contain the strings "", or "–!>", nor end with the string "<!-".

    And is also invalid in SGML where pairs of -- sequences delimit comments in any markup declaration (XML and HTML comments being just a special case):

    <!ENTITY bla "some text"
        -- an SGML comment --
        -- another comment -->
    

    Noting this so that you can accept to bite the bullet and change those "nested comments" manually in a text editor, because this "nested comment" phenomenon won’t show ever again and thus doesn’t seem to warrant programming a solution 😉

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search