I am trying to strip all HTML brackets, except anything from the first line of code using this REGEX
(?ms)(?!A)<[^>]*>
It’s very close to working, unfortunately it strips the closing brackets from the first line as well. The example I am working with is:
<div id="uniquename">https://www.example.com?item_id=10302</div>
<div id="uniqname2">
<div id="uniqname3">
<h2 id="uniqnametitle">Title</h2>
<div class="row">
<div class="large-3 columns">Example:</div>
<div class="large-9 columns"><b>Sub example</b></div>
</div>
<div class="row">
<div class="large-3 columns">Additional</div>
The current REGEX removes all other HTML tags and excludes the first line with the exception of the trailing div close tag and outputs the following:
<div id="uniquename">https://www.example.com?item_id=10302
Title
Example:
Sub example
Additional
If there is a better way to perform the REGEX than excluding the first line I am open to suggestions. Skipping the first line seems to be the easiest way, however, I need the end bracket to stay intact.
What am I missing in my REGEX?
3
Answers
You should use an HTML parser in general…
However, you can do:
Or an awk:
Either prints:
You can try this
(?ms)((?<firstline>A[^n]*)|(<[^>]*>))
With substitution
$firstline
Playground for your example – https://regex101.com/r/ASItOP/3
UPDATE 1 : just realized it could be massively simplified