skip to Main Content

I have a HTML string like this:

$text = "<strong>text</strong>more text<em>more text</em>additional text<span>text</span>....."

I would like to replace the x characters with ? but only inside the <em> tag.

The string above will become:

$text = "<strong>text</strong>more text<em>more te?t</em>additional text<span>text</span>....."

How can it be done in PHP?

2

Answers


  1. One option is to use the DOM extension. It’s convoluted and documentation could use more examples, but it gets the job done:

    $text = "<strong>text</strong>more text<em>more text</em>additional text<span>text</span>.....";
    
    $document = new DOMDocument();
    $document->loadHTML($text);
    foreach ($document->getElementsByTagName('em') as $em) {
        $em->nodeValue = strtr($em->nodeValue, 'x', '?');
    }
    foreach ($document->getElementsByTagName('body')[0]->childNodes as $node) {
        echo $document->saveHTML($node);
    }
    

    Demo

    Note my example assumes that you don’t have other tags inside.

    Login or Signup to reply.
  2. As Álvaro ‘s answer works out pretty well, there might be an easier solution using XPath.

    <?php
    
    declare(strict_types=1);
    
    namespace Marcel;
    
    use DOMDocument;
    use DOMXPath;
    
    $text = "<strong>text</strong>more text<em>more text</em>additional text<span>text</span>";
    
    $doc = new DOMDocument();
    $doc->loadHTML($text, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
    $xpath = new DOMXPath($doc);
    $elements = $xpath->query('//em[contains(text(), "x")]');
    
    if ($elements !== false) {
        foreach ($elements as $element) {
            $element->textContent = str_replace('x', '?', $element->textContent);
        }
    }
    
    $text = $doc->saveHTML();
    

    What does the code snippet?

    1. Load the HTML part with DOMDocument and tell DOMDocument, that we don ‘t want to know anything about doctypes and dtd ‘s. That will be important when saving the manipulated HTML string. If not doing so, you ‘ll have an HTML doctype in the saved string.
    2. Build an XPath instance with the loaded HTML string / DOMDocument instance and query vor all <em> elements whose content contains the letter "x". The result of that will be either null or a DOMNodeList instance.
    3. Iterate the DOMNodeList instance if it is not null and replace the letter "x" with a question mark.
    4. Save the manipulated DOMDocument back to a string. Since we told DOMDocument to not know anything about doctypes and dtd ‘s it is storing the loaded content only.

    The result

    The result is a clean string with replaced "x" letters in the text nodes of <em> elements.

    <strong>text</strong>more text<em>more te?t</em>additional text<span>text</span>

    Why using XPath instead of using RegEx

    The answer is simple: Because it ‘s HTML. It ‘s recommended to parse and manipulate XML / HTML documents with PHP ‘s own DOM library because it ‘s way faster than using regular expressions and it is made for this kind of DOM actions. In the shown code example we ‘re using only one iteration. Even this iteration is pretty fast, because it ‘s using an internal iterator instead of a simple array. The difference is, that an array occupies the memory exponentially with its entire content. An iterator only occupies the memory for the current element in the iteration. Long story short: More Performance. Less memory consumption.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search