Regex: Exclude first line with brackets - PHP

DonDuck
January 13, 2023
171 views
2 votes
3 Answers

I am trying to strip all HTML brackets, except anything from the first line of code using this REGEX

(?ms)(?!A)<[^>]*>

It’s very close to working, unfortunately it strips the closing brackets from the first line as well. The example I am working with is:

<div id="uniquename">https://www.example.com?item_id=10302</div>
<div id="uniqname2">
<div id="uniqname3">
<h2 id="uniqnametitle">Title</h2>
<div class="row">
<div class="large-3 columns">Example:</div>
<div class="large-9 columns"><b>Sub example</b></div>
</div>
<div class="row">
<div class="large-3 columns">Additional</div>

The current REGEX removes all other HTML tags and excludes the first line with the exception of the trailing div close tag and outputs the following:

<div id="uniquename">https://www.example.com?item_id=10302
Title
Example:
Sub example
Additional

If there is a better way to perform the REGEX than excluding the first line I am open to suggestions. Skipping the first line seems to be the easiest way, however, I need the end bracket to stay intact.

What am I missing in my REGEX?

Tags: html php

Answers

- dawg
- January 13, 2023 at 11:35 pm
- 0 votes
0
You should use an HTML parser in general…

However, you can do:
```
$ cat <(head -n 1 file) <(sed 1d file | sed -E 's/<[^>]*>//g; /^$/d')
```
Or an awk:
```
$ awk 'FNR==1 {print; next}
      {gsub(/<[^>]*>/,""); if ($0) print}' file
```
Either prints:
```
<div id="uniquename">https://www.example.com?item_id=10302</div>
Title
Example:
Sub example
Additional
```
Login or Signup to reply.

- FireAlkazar
- January 13, 2023 at 11:50 pm
- 0 votes
0
You can try this
(?ms)((?<firstline>A[^n]*)|(<[^>]*>))
With substitution
$firstline

Playground for your example – https://regex101.com/r/ASItOP/3

Login or Signup to reply.

- RAREKpopManifesto
- January 14, 2023 at 3:46 am
- 0 votes
0
UPDATE 1 : just realized it could be massively simplified
```
gawk 'NR==!_ || (NF=NF)*/./' FS='<[^>]+>' OFS=
```
```
mawk 'NR==!_ || (NF=NF)*/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS=
```
```
 1  <div id="uniquename">https://www.example.com?item_id=10302</div>
 2  Title
 3  Example:
 4  Sub example
 5  Additional
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Regex: Exclude first line with brackets – PHP

Answers