I have string of nested html comment level look like this structure <!-- some text <!-- some text --> -->
. I want to remove out every comment tag that is inside another comment, so that the final result is <!-- some text some text -->
.
This is a regx pattern that I used but it does not match the string comment like I want
<!--[Ss]+?(<!--[Ss]+?-->)[Ss]+?-->
which is used to match this string pattern:
<!-- {% set Navbar = request.Navbar if request.Navbar else "True" %} -->
<!-- {% if Navbar == "True" %}
<div class="navbar navbar-default navbar-static-top">
<form method="GET" id="frm" name="frm" role="form">
<div class="col-sm-8">
<div style="margin-top: 8px;">
<div class="col-sm-3" style="padding-left: 0;">
<input name="From" id="From" value="L___ From R___" class="form-control" placeholder="Effective From *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
</div>
<div class="col-sm-3" style="padding-left: 0;">
<input name="To" id="To" value="L___ To R___" class="form-control" placeholder="Effective To *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
</div>
<div class="col-sm-2" style="padding-left: 0;">
<select name="Active" class="form-control" style="width: 100%;">
<!-- {% if Active == 'Y' %}
<option selected value="Y">Active (Yes)</option>
<option value="N">Active (No)</option>
{% else %}
<option value="Y">Active (Yes)</option>
<option selected value="N">Active (No)</option>
{% endif %} -->
</select>
</div>
<!-- {% set CurrencyList = getRecord('MKT_CURRENCY') %} -->
<!-- {% if len(CurrencyList) > 0 %}
<div class="col-sm-2" style="padding-left: 0;">
<select name="FilterCurrency" class="form-control" style="width: 100%;">
<!-- {% for CurrObj in CurrencyList %}
<option L___'selected' if FilterCurrency == CurrObj.ID else ''R___ value="L___ CurrObj.ID R___">L___ CurrObj.ID R___</option>
{% endfor %} -->
</select>
</div>
{% endif %} -->
<label class="col-sm-1" style="padding:0px;">
<button type="submit" class="btn btn-flat btn-labeled btn-primary" style="padding: 7px 25px;">
Show
</button>
</label>
</div>
</div>
<div class="col-sm-4" style="margin-top: 8px;">
<div class="pull-right">
<a id="print" onclick="" class="btn btn-flat btn-labeled btn-success">
<span class="btn-label icon fa fa-print"></span> Print
</a>
</div>
</div>
</form>
</div>
{% endif %} -->
And the result after removing nested comment should be:
<!-- {% set Navbar = request.Navbar if request.Navbar else "True" %} -->
<!-- {% if Navbar == "True" %}
<div class="navbar navbar-default navbar-static-top">
<form method="GET" id="frm" name="frm" role="form">
<div class="col-sm-8">
<div style="margin-top: 8px;">
<div class="col-sm-3" style="padding-left: 0;">
<input name="From" id="From" value="L___ From R___" class="form-control" placeholder="Effective From *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
</div>
<div class="col-sm-3" style="padding-left: 0;">
<input name="To" id="To" value="L___ To R___" class="form-control" placeholder="Effective To *" style="width: 100%;" required data-provide="datepicker" data-date-format="yyyy-mm-dd">
</div>
<div class="col-sm-2" style="padding-left: 0;">
<select name="Active" class="form-control" style="width: 100%;">
{% if Active == 'Y' %}
<option selected value="Y">Active (Yes)</option>
<option value="N">Active (No)</option>
{% else %}
<option value="Y">Active (Yes)</option>
<option selected value="N">Active (No)</option>
{% endif %}
</select>
</div>
{% set CurrencyList = getRecord('MKT_CURRENCY') %}
{% if len(CurrencyList) > 0 %}
<div class="col-sm-2" style="padding-left: 0;">
<select name="FilterCurrency" class="form-control" style="width: 100%;">
{% for CurrObj in CurrencyList %}
<option L___'selected' if FilterCurrency == CurrObj.ID else ''R___ value="L___ CurrObj.ID R___">L___ CurrObj.ID R___</option>
{% endfor %}
</select>
</div>
{% endif %}
<label class="col-sm-1" style="padding:0px;">
<button type="submit" class="btn btn-flat btn-labeled btn-primary" style="padding: 7px 25px;">
Show
</button>
</label>
</div>
</div>
<div class="col-sm-4" style="margin-top: 8px;">
<div class="pull-right">
<a id="print" onclick="" class="btn btn-flat btn-labeled btn-success">
<span class="btn-label icon fa fa-print"></span> Print
</a>
</div>
</div>
</form>
</div>
{% endif %} -->
This what I tried on regx101.
Please kindly help me to remove any nested html comment from string above. Thank you.
3
Answers
Assuming your problem statement could be described as removing the inner HTML comments only, you could use the following regex, which uses a tempered dot trick:
Demo
Explanation:
<!--
match leading<!--
(?:(?!<!--).)*?
match any content WITHOUT crossing another<!--
-->
then match the nearest closing-->
I doubt there’s an elegant solution with regex:
This isn’t possible using plain JavaScript regexp functionality, and I think Tim Biegeleisen’s answer already comes as close as it gets, though it’ll remove any markup comment not just those "nested" ones. For a regexp to target only nested ones, and moreover to only target the comment delimiters rather than the entire comment, functionality such as provided by traditional Unix
sed
(ored
/vi
) would be required, allowing you to match a regexp within a larger regexp match.Btw. your "nested" commenting syntax is invalid in HTML according to The WHATWG HTML spec chapter 13.1.6 paragraph 2:
And is also invalid in SGML where pairs of
--
sequences delimit comments in any markup declaration (XML and HTML comments being just a special case):Noting this so that you can accept to bite the bullet and change those "nested comments" manually in a text editor, because this "nested comment" phenomenon won’t show ever again and thus doesn’t seem to warrant programming a solution 😉