skip to Main Content

Problem

My job is to extract data from an XML document and build an HTML page using that data. I’m using PHP to parse and manipulate the XML document.

One portion of the XML document contains inlined elements used in a fashion similar to this:

<desc>
    These are the <special>&lt;best&gt;</special>
    chocolate chip cookies <special>&lt;EVER&gt;</special>
</desc>

I’d like to convert this into my HTML document like so:

These are the <em>&lt;best&gt;</em> chocolate chip cookies <em>&lt;EVER&gt;</em>

So that it displays in the browser as

These are the <best> chocolate chip cookies <EVER>

I’m currently using PHP’s SimpleXML module. I have no problem parsing the XML document and retrieving the parent element (<desc>).

My Attempt

I thought about manipulating the raw XML string and doing a search and replace to convert the <special> tags to my target tag (<em>), but, of course, XML will just parse it just the same, only under the name <em> instead.

I also considered retrieving the XML directly from the <desc> node at the point of use with asXML() and then doing the search and replace there and then simply echoing the raw string into the HTML document, but at that point it appears that the <special> nodes have already been parsed away and I just get the string:

These are the <best> chocolate chip cookies <EVER>

I’ve also looked into the XMLReader class, but it seems to read the XML from a stream, so I can’t access the nodes I need arbitrarily when I need them.

I’d appreciate any advice. Thanks.

2

Answers


  1. Here is a solution that creates a DOM object from a SimpleXMLElement, and iterates over its child nodes to build the HTML:

    $xml = <<<XML
    <desc>
        These are the <special>&lt;best&gt;</special> chocolate chip cookies <special>&lt;EVER&gt;</special>
    </desc>
    XML;
    
    $sx = new SimpleXMLElement($xml);
    $dom = dom_import_simplexml($sx);
    
    $html = '';
    foreach($dom->childNodes as $node)
    {
        switch($node->nodeType)
        {
            case XML_ELEMENT_NODE:
                if($node->tagName=='special')
                    $html .= '<em>'.htmlspecialchars($node->textContent).'</em>';
                break;
            case XML_TEXT_NODE:
                $html .= htmlspecialchars($node->data);
                break;
        }
    }
    
    echo trim($html);
    

    Output:

    These are the <em>&lt;best&gt;</em> chocolate chip cookies <em>&lt;EVER&gt;</em>
    

    (demo)

    Login or Signup to reply.
  2. XSLT is a language for exactly that – transforming an XML into another XML or HTML. PHP supports XSLT 1.0 with ext/xslt.

    <?php
    $xml = <<<'XML'
    <desc>
        These are the <special>&lt;best&gt;</special>
        chocolate chip cookies <special>&lt;EVER&gt;</special>
    </desc>
    XML;
    
    $xslt = <<<'XSLT'
    <xsl:stylesheet
      version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
      <xsl:output method="html"/>
    
      <xsl:template match="/desc">
        <xsl:apply-templates/>
      </xsl:template>
    
      <xsl:template match="special">
        <em><xsl:apply-templates/></em>
      </xsl:template>
    
    </xsl:stylesheet>
    XSLT;
    
    // load the content
    $content = new DOMDocument();
    $content->loadXML($xml);
    // load the template
    $template = new DOMDocument();
    $template->loadXML($xslt);
    // bootstrap XSLT
    $processor = new XSLTProcessor();
    $processor->importStylesheet($template);
    // transform and output
    echo $processor->transformToXml($content);
    

    Output

        These are the <em>&lt;best&gt;</em>
        chocolate chip cookies <em>&lt;EVER&gt;</em>
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search