How to take and display the content and tags of the xml file? - PHP

ogerichard
December 31, 2022
162 views
0 votes
2 Answers

I would like to take and display the tags and tag contents of the xml file in a table. For this, I have created a regex that allows me to do this, but it doesn’t work correctly as I expected.

Here is my xml file:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
   <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>

Here is the regex I had used:

preg_match_all("|<[^>]+>(.*)</[^>]+>|U", $content, $matches, PREG_SET_ORDER) ;

Here is the result of this regex:

array(60) {
  [0]=>
  array(2) {
    [0]=>
    string(37) "<author>Gambardella, Matthew</author>"
    [1]=>
    string(20) "Gambardella, Matthew"
  }
  [1]=>
  array(2) {
    [0]=>
    string(36) "<title>XML Developer's Guide</title>"
    [1]=>
    string(21) "XML Developer's Guide"
  }
  [2]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [3]=>
  array(2) {
    [0]=>
    string(20) "<price>44.95</price>"
    [1]=>
    string(5) "44.95"
  }
  [4]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-10-01</publish_date>"
    [1]=>
    string(10) "2000-10-01"
  }
  [5]=>
  array(2) {
    [0]=>
    string(27) "<author>Ralls, Kim</author>"
    [1]=>
    string(10) "Ralls, Kim"
  }
  [6]=>
  array(2) {
    [0]=>
    string(28) "<title>Midnight Rain</title>"
    [1]=>
    string(13) "Midnight Rain"
  }
  [7]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [8]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [9]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-16</publish_date>"
    [1]=>
    string(10) "2000-12-16"
  }
  [10]=>
  array(2) {
    [0]=>
    string(28) "<author>Corets, Eva</author>"
    [1]=>
    string(11) "Corets, Eva"
  }
  [11]=>
  array(2) {
    [0]=>
    string(30) "<title>Maeve Ascendant</title>"
    [1]=>
    string(15) "Maeve Ascendant"
  }
  [12]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [13]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [14]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-11-17</publish_date>"
    [1]=>
    string(10) "2000-11-17"
  }
  [15]=>
  array(2) {
    [0]=>
    string(28) "<author>Corets, Eva</author>"
    [1]=>
    string(11) "Corets, Eva"
  }
  [16]=>
  array(2) {
    [0]=>
    string(30) "<title>Oberon's Legacy</title>"
    [1]=>
    string(15) "Oberon's Legacy"
  }
  [17]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [18]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [19]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2001-03-10</publish_date>"
    [1]=>
    string(10) "2001-03-10"
  }
  [20]=>
  array(2) {
    [0]=>
    string(28) "<author>Corets, Eva</author>"
    [1]=>
    string(11) "Corets, Eva"
  }
  [21]=>
  array(2) {
    [0]=>
    string(33) "<title>The Sundered Grail</title>"
    [1]=>
    string(18) "The Sundered Grail"
  }
  [22]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [23]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [24]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2001-09-10</publish_date>"
    [1]=>
    string(10) "2001-09-10"
  }
  [25]=>
  array(2) {
    [0]=>
    string(33) "<author>Randall, Cynthia</author>"
    [1]=>
    string(16) "Randall, Cynthia"
  }
  [26]=>
  array(2) {
    [0]=>
    string(26) "<title>Lover Birds</title>"
    [1]=>
    string(11) "Lover Birds"
  }
  [27]=>
  array(2) {
    [0]=>
    string(22) "<genre>Romance</genre>"
    [1]=>
    string(7) "Romance"
  }
  [28]=>
  array(2) {
    [0]=>
    string(19) "<price>4.95</price>"
    [1]=>
    string(4) "4.95"
  }
  [29]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-09-02</publish_date>"
    [1]=>
    string(10) "2000-09-02"
  }
  [30]=>
  array(2) {
    [0]=>
    string(31) "<author>Thurman, Paula</author>"
    [1]=>
    string(14) "Thurman, Paula"
  }
  [31]=>
  array(2) {
    [0]=>
    string(28) "<title>Splish Splash</title>"
    [1]=>
    string(13) "Splish Splash"
  }
  [32]=>
  array(2) {
    [0]=>
    string(22) "<genre>Romance</genre>"
    [1]=>
    string(7) "Romance"
  }
  [33]=>
  array(2) {
    [0]=>
    string(19) "<price>4.95</price>"
    [1]=>
    string(4) "4.95"
  }
  [34]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-11-02</publish_date>"
    [1]=>
    string(10) "2000-11-02"
  }
  [35]=>
  array(2) {
    [0]=>
    string(30) "<author>Knorr, Stefan</author>"
    [1]=>
    string(13) "Knorr, Stefan"
  }
  [36]=>
  array(2) {
    [0]=>
    string(30) "<title>Creepy Crawlies</title>"
    [1]=>
    string(15) "Creepy Crawlies"
  }
  [37]=>
  array(2) {
    [0]=>
    string(21) "<genre>Horror</genre>"
    [1]=>
    string(6) "Horror"
  }
  [38]=>
  array(2) {
    [0]=>
    string(19) "<price>4.95</price>"
    [1]=>
    string(4) "4.95"
  }
  [39]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-06</publish_date>"
    [1]=>
    string(10) "2000-12-06"
  }
  [40]=>
  array(2) {
    [0]=>
    string(29) "<author>Kress, Peter</author>"
    [1]=>
    string(12) "Kress, Peter"
  }
  [41]=>
  array(2) {
    [0]=>
    string(27) "<title>Paradox Lost</title>"
    [1]=>
    string(12) "Paradox Lost"
  }
  [42]=>
  array(2) {
    [0]=>
    string(30) "<genre>Science Fiction</genre>"
    [1]=>
    string(15) "Science Fiction"
  }
  [43]=>
  array(2) {
    [0]=>
    string(19) "<price>6.95</price>"
    [1]=>
    string(4) "6.95"
  }
  [44]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-11-02</publish_date>"
    [1]=>
    string(10) "2000-11-02"
  }
  [45]=>
  array(2) {
    [0]=>
    string(29) "<author>O'Brien, Tim</author>"
    [1]=>
    string(12) "O'Brien, Tim"
  }
  [46]=>
  array(2) {
    [0]=>
    string(52) "<title>Microsoft .NET: The Programming Bible</title>"
    [1]=>
    string(37) "Microsoft .NET: The Programming Bible"
  }
  [47]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [48]=>
  array(2) {
    [0]=>
    string(20) "<price>36.95</price>"
    [1]=>
    string(5) "36.95"
  }
  [49]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-09</publish_date>"
    [1]=>
    string(10) "2000-12-09"
  }
  [50]=>
  array(2) {
    [0]=>
    string(29) "<author>O'Brien, Tim</author>"
    [1]=>
    string(12) "O'Brien, Tim"
  }
  [51]=>
  array(2) {
    [0]=>
    string(44) "<title>MSXML3: A Comprehensive Guide</title>"
    [1]=>
    string(29) "MSXML3: A Comprehensive Guide"
  }
  [52]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [53]=>
  array(2) {
    [0]=>
    string(20) "<price>36.95</price>"
    [1]=>
    string(5) "36.95"
  }
  [54]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-01</publish_date>"
    [1]=>
    string(10) "2000-12-01"
  }
  [55]=>
  array(2) {
    [0]=>
    string(28) "<author>Galos, Mike</author>"
    [1]=>
    string(11) "Galos, Mike"
  }
  [56]=>
  array(2) {
    [0]=>
    string(53) "<title>Visual Studio 7: A Comprehensive Guide</title>"
    [1]=>
    string(38) "Visual Studio 7: A Comprehensive Guide"
  }
  [57]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [58]=>
  array(2) {
    [0]=>
    string(20) "<price>49.95</price>"
    [1]=>
    string(5) "49.95"
  }
  [59]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2001-04-16</publish_date>"
    [1]=>
    string(10) "2001-04-16"
  }
}

But the problem because of this regex, I don’t have all the content of the xml file because of their attribute, I think. So, how can I get the other tags in the xml file that are not displayed because of their attribute? What should I change in the regex please?

Tags: php xml

Answers

RegEx can be used to extract data from an XML string but it does not recognize the nodes and the hierarchy. So it is only useful for very specific cases. The RegEx will get complex really fast also.

Use an XML parser for reading or an XSLT processor for transforming. Xpath expressions allow to fetch specific nodes or values.

Here is a basic example using DOM:

// bootstrap DOM+Xpath
$document = new DOMDocument();
$document->loadXML(getXMLString());
$xpath = new DOMXpath($document);

// iterate "book" elements
foreach ($xpath->evaluate('/catalog/book') as $book) {
    var_dump(
        [
            // read the "id" attribute
            'id' => $book->getAttribute('id'),
            // fetch first "title" element child as string
            'title' => $xpath->evaluate('string(title)', $book)
        ]
    );
}

function getXMLString(): string {
    return <<<'XML'
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>
XML;
}

"The Right tool for the right job" is a commonly cited expression – a Regex to parse XML is not, in my opinion, the "Right Tool!" The task of presenting the contents of an XML file in table form can best be accomplished with XSL Transformations

Given the original XML, saved as catalog.xml a simple XSL stylesheet can be used to generate the entire HTML Table with the content drawn directly from the XML.

catalog.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" standalone="yes" indent="yes" encoding="utf-8"/>
    <xsl:template match="/">
        <style>
            table.xml{
                border:1px solid gray;
                display:table;
                border-collapse:collapse;
                border-spacing:0px;
                empty-cells:show;
                table-layout:auto;
                font-family:Verdana, Geneva, Arial, Helvetica, sans-serif;
                font-size:0.8rem;
            }
            table.xml tr:first-of-type{
                background:#2f4f4f;
                color:white;
            }
            table.xml tr:nth-of-type( even ) td{
                background:aliceblue;
            }
            table.xml td{
                margin:0;
                padding:5px;
                border-bottom:1px dotted rgba(133,133,133,0.5);
            }
        </style>
        <table class="xml" border="1" cellspacing="5" cellpadding="5">
            <tr>
                <th>Author</th>
                <th>Title</th>
                <th>Genre</th>
                <th>Price</th>
                <th>Published</th>
                <th>Description</th>
            </tr>
            <xsl:for-each select="catalog/book">
            <tr>
                <xsl:attribute name='data-id'>
                    <xsl:value-of select="@id"/>
                </xsl:attribute>
                <xsl:for-each select="*">
                    <td>
                        <xsl:value-of select="text()" />
                    </td>
                </xsl:for-each>
            </tr>
            </xsl:for-each>
        </table>
    </xsl:template>
</xsl:stylesheet>

To use the XSL within PHP:

<!DOCTYPE html>
<html lang='en'>
    <head>
        <meta charset='utf-8' />
        <title>XSLT & XML</title>
    </head>
    <body>
    <?php
    
        $xmlfile='catalog.xml';
        $xslfile='catalog.xsl';
        
        $xml=new DOMDocument;
        $xml->load( $xmlfile );
        
        $xsl=new DOMDocument;
        $xsl->load( $xslfile );
        
        $xslt=new XSLTProcessor;
        $xslt->importStyleSheet( $xsl );
        $html=$xslt->transformToXML( $xml );
        
        echo $html;
    ?>
    </body>
</html>

This yields:

To display the tag and the content – again using XSLT the xsl file needs to be modified slightly. Within the <xsl:for-each select="*"> loop you also want to add the tagName like this perhaps:

<xsl:value-of select="name()" /> | <xsl:value-of select="text()" />

This modification yields:

To process the XML with DOMDocument only and store the tag&value as a string and the value also as string you could do like this:

$output=array();

$dom=new DomDocument;
$dom->load('catalog.xml');
$books=$dom->getElementsByTagName('book');

foreach( $books as $book ){
    $id=$book->getAttribute('id');
    $tmp=array();
    
    for( $i=0; $i < $book->childNodes->length; $i++ ){
        if( $book->childNodes[ $i ]->nodeType===XML_ELEMENT_NODE ){
            $node=$book->childNodes[ $i ];

            $tmp[]=$dom->saveXML( $node );
            $tmp[]=$node->nodeValue;
        }
    }
    
    $output[$id]=$tmp;
}

# to illustrate output
printf('<textarea rows=20 cols=100>%s</textarea>',print_r($output,true));

Which yields output like this:

Please signup or login to give your own answer.

Click here to cancel reply.

How to take and display the content and tags of the xml file? – PHP

Answers