I am learning php
language.
I want to show the table of contents for the article. Convert the headings (h2,h3,h4,...)
into a list and create links.
This is my php code.
$Post = '
<h2>Title 01</h2>
<h3>Title 01.01</h3>
<h3>Title 01.02</h3>
<h2>Title 02</h2>
<h3>Title 02.02</h3>
';
$c = 1;
$r = preg_replace_callback('~<h*([^>]*)>~i', function($res) use (&$c){
return '<li><a id="#id'.$c++.'">'.$res[1].'</a></li>';
}, $Post);
$Post = $r;
echo '<ul>';
echo $Post;
echo '</ul>';
The output shows as below, but the above code works wrongly.
<ul>
<li><a id="#id1">2</a></li>Title 01<li><a id="#id2">/h2</a></li>
<li><a id="#id3">3</a></li>Title 01.01<li><a id="#id4">/h3</a></li>
<li><a id="#id5">3</a></li>Title 01.02<li><a id="#id6">/h3</a></li>
<li><a id="#id7">2</a></li>Title 02<li><a id="#id8">/h2</a></li>
<li><a id="#id9">3</a></li>Title 02.02<li><a id="#id10">/h3</a></li>
</ul>
I know that the Java code is written incorrectly. But i want to show the output as below.
<ul>
<li><a href="#id1">Title 01</a></li>
<li><a href="#id2">Title 01.01</a></li>
<li><a href="#id3">Title 01.02</a></li>
<li><a href="#id4">Title 02</a></li>
<li><a href="#id5">Title 02.02</a></li>
</ul>
2
Answers
Your regular expression is needlessly complex.
You could just use
<h.>(.*)</h.>
to correctly match what you are trying to match.I added it to your snippet above to show your desired result:
Although, as suggested in the comments, you should probably use a parser here, if this turns into anything more than a toy example. Then regular expressions are almost always a sure way to shoot yourself in the foot.
Your regex is wrong for what you’re trying to do:
<h*
means that it will match an angle bracket followed by zero or more h’s. Which basically means your regex is matching everything between each <> pairing, (including </…>).You could do this to extract the titles from your headings:
But those linked need to target the IDs in the headings, so you need to do this to extract them:
But what if you’ve got other attributes on the heading?
Or markup inside the heading?
Regex is not a great way to parse HTML. It is a powerful tool, and it is possible to use it for this, but there are better ways.
I’ve assumed there is no markup in the headings, but only a couple of changes are needed to copy inner markup if necessary.