skip to Main Content

Hi I need a script to remove, from a html string, all "li" elements empty or with only spaces. But also with inside empty tag (one or nested empty tags)

I use this preg_replace to remove succesfully only empty "li". In this case the 4th li.

But i don’t know how to remove last "li" that has got an empty "span" inside it… any suggest? Thanks

$contenuto = '<ol style="margin-top: 0cm; margin-bottom: 0cm;">
<li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">x</span></li>
<li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">y</span></li>
<li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">z</span></li>
<li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"></li>
<li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt; color: red;"> </span></li>
</ol>';

$contenuto = preg_replace('/<li[^>]*>(s|&nbsp;)*</li>/', '', $contenuto);

echo $contenuto;

2

Answers


  1. I answer to quote 2 things:

    1. @bobble bubble is right when he said you can parse small pieces of HTML
      using Regex, especially when you are sure about the encoding /
      language…
    2. You can use ChatGPT when you deal with Regex, it works well when you
      need something simple.

    Here is my answer:

    $regex = '/<li[^>]*>(?:s*|(?:<[^>/]+[^>]*>s*</[^>]+>)(?:s*|</?w+[^>]*>s*))</li>/s';
    $contenuto = preg_replace($regex, '', $contenuto);    
    
    Login or Signup to reply.
  2. The XPath to select the empty li nodes is

    //li[not(normalize-space())]
    

    It’s not what you asked for. But I find that much more concise and readable and easier to come up with than a reliable Regex that does the same.

    Unfortunately, PHP doesn’t have something like an xpath_replace function which hides away all the boilerplate to do what preg_replace does for a Regex. So you’d have to write some additional code to get your desired output:

    <?php
    $html = '<ol style="margin-top: 0cm; margin-bottom: 0cm;">
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">x</span></li>
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">y</span></li>
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">z</span></li>
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"></li>
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt; color: red;"> </span></li>
    </ol>';
    
    $emptyLists = '//li[not(normalize-space())]';
    
    $dom = new DOMDocument;
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    foreach($xpath->query($emptyLists) as $node) {
        $node->parentNode->removeChild($node);
    }
    
    echo $dom->saveHTML($xpath->query('//ol')->item(0));
    

    will output

    <ol style="margin-top: 0cm; margin-bottom: 0cm;">
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">x</span></li>
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">y</span></li>
    <li style="margin: 0cm 0cm 0cm 47.6px; text-align: justify; line-height: normal; font-size: 11pt; font-family: Calibri, sans-serif; text-indent: 0.4px;"><span style="font-size: 10.0pt;">z</span></li>
    
    
    </ol>
    

    Demo https://3v4l.org/OA5eV

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search