skip to Main Content

I am trying to strip all HTML brackets, except anything from the first line of code using this REGEX

(?ms)(?!A)<[^>]*>

It’s very close to working, unfortunately it strips the closing brackets from the first line as well. The example I am working with is:

<div id="uniquename">https://www.example.com?item_id=10302</div>
<div id="uniqname2">
<div id="uniqname3">
<h2 id="uniqnametitle">Title</h2>
<div class="row">
<div class="large-3 columns">Example:</div>
<div class="large-9 columns"><b>Sub example</b></div>
</div>
<div class="row">
<div class="large-3 columns">Additional</div>

The current REGEX removes all other HTML tags and excludes the first line with the exception of the trailing div close tag and outputs the following:

<div id="uniquename">https://www.example.com?item_id=10302
Title
Example:
Sub example
Additional

If there is a better way to perform the REGEX than excluding the first line I am open to suggestions. Skipping the first line seems to be the easiest way, however, I need the end bracket to stay intact.

What am I missing in my REGEX?

3

Answers


  1. You should use an HTML parser in general…

    However, you can do:

    $ cat <(head -n 1 file) <(sed 1d file | sed -E 's/<[^>]*>//g; /^$/d')
    

    Or an awk:

    $ awk 'FNR==1 {print; next}
          {gsub(/<[^>]*>/,""); if ($0) print}' file
    

    Either prints:

    <div id="uniquename">https://www.example.com?item_id=10302</div>
    Title
    Example:
    Sub example
    Additional
    
    Login or Signup to reply.
  2. You can try this
    (?ms)((?<firstline>A[^n]*)|(<[^>]*>))
    With substitution
    $firstline

    Playground for your example – https://regex101.com/r/ASItOP/3

    Login or Signup to reply.
  3. UPDATE 1 : just realized it could be massively simplified

    gawk 'NR==!_ || (NF=NF)*/./' FS='<[^>]+>' OFS=
    

    mawk 'NR==!_ || (NF=NF)*/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS=
    
     1  <div id="uniquename">https://www.example.com?item_id=10302</div>
     2  Title
     3  Example:
     4  Sub example
     5  Additional
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search