I would like to take and display the tags and tag contents of the xml file in a table. For this, I have created a regex that allows me to do this, but it doesn’t work correctly as I expected.
Here is my xml file:
`
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology
conference, tempers fly as feathers get ruffled.</description>
</book>
<book id="bk107">
<author>Thurman, Paula</author>
<title>Splish Splash</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-11-02</publish_date>
<description>A deep sea diver finds true love twenty
thousand leagues beneath the sea.</description>
</book>
<book id="bk108">
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price>6.95</price>
<publish_date>2000-11-02</publish_date>
<description>After an inadvertant trip through a Heisenberg
Uncertainty Device, James Salway discovers the problems
of being quantum.</description>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
</book>
<book id="bk111">
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>
`
Here is the regex I had used:
preg_match_all("|<[^>]+>(.*)</[^>]+>|U", $content, $matches, PREG_SET_ORDER) ;
Here is the result of this regex:
`
array(60) {
[0]=>
array(2) {
[0]=>
string(37) "<author>Gambardella, Matthew</author>"
[1]=>
string(20) "Gambardella, Matthew"
}
[1]=>
array(2) {
[0]=>
string(36) "<title>XML Developer's Guide</title>"
[1]=>
string(21) "XML Developer's Guide"
}
[2]=>
array(2) {
[0]=>
string(23) "<genre>Computer</genre>"
[1]=>
string(8) "Computer"
}
[3]=>
array(2) {
[0]=>
string(20) "<price>44.95</price>"
[1]=>
string(5) "44.95"
}
[4]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-10-01</publish_date>"
[1]=>
string(10) "2000-10-01"
}
[5]=>
array(2) {
[0]=>
string(27) "<author>Ralls, Kim</author>"
[1]=>
string(10) "Ralls, Kim"
}
[6]=>
array(2) {
[0]=>
string(28) "<title>Midnight Rain</title>"
[1]=>
string(13) "Midnight Rain"
}
[7]=>
array(2) {
[0]=>
string(22) "<genre>Fantasy</genre>"
[1]=>
string(7) "Fantasy"
}
[8]=>
array(2) {
[0]=>
string(19) "<price>5.95</price>"
[1]=>
string(4) "5.95"
}
[9]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-12-16</publish_date>"
[1]=>
string(10) "2000-12-16"
}
[10]=>
array(2) {
[0]=>
string(28) "<author>Corets, Eva</author>"
[1]=>
string(11) "Corets, Eva"
}
[11]=>
array(2) {
[0]=>
string(30) "<title>Maeve Ascendant</title>"
[1]=>
string(15) "Maeve Ascendant"
}
[12]=>
array(2) {
[0]=>
string(22) "<genre>Fantasy</genre>"
[1]=>
string(7) "Fantasy"
}
[13]=>
array(2) {
[0]=>
string(19) "<price>5.95</price>"
[1]=>
string(4) "5.95"
}
[14]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-11-17</publish_date>"
[1]=>
string(10) "2000-11-17"
}
[15]=>
array(2) {
[0]=>
string(28) "<author>Corets, Eva</author>"
[1]=>
string(11) "Corets, Eva"
}
[16]=>
array(2) {
[0]=>
string(30) "<title>Oberon's Legacy</title>"
[1]=>
string(15) "Oberon's Legacy"
}
[17]=>
array(2) {
[0]=>
string(22) "<genre>Fantasy</genre>"
[1]=>
string(7) "Fantasy"
}
[18]=>
array(2) {
[0]=>
string(19) "<price>5.95</price>"
[1]=>
string(4) "5.95"
}
[19]=>
array(2) {
[0]=>
string(39) "<publish_date>2001-03-10</publish_date>"
[1]=>
string(10) "2001-03-10"
}
[20]=>
array(2) {
[0]=>
string(28) "<author>Corets, Eva</author>"
[1]=>
string(11) "Corets, Eva"
}
[21]=>
array(2) {
[0]=>
string(33) "<title>The Sundered Grail</title>"
[1]=>
string(18) "The Sundered Grail"
}
[22]=>
array(2) {
[0]=>
string(22) "<genre>Fantasy</genre>"
[1]=>
string(7) "Fantasy"
}
[23]=>
array(2) {
[0]=>
string(19) "<price>5.95</price>"
[1]=>
string(4) "5.95"
}
[24]=>
array(2) {
[0]=>
string(39) "<publish_date>2001-09-10</publish_date>"
[1]=>
string(10) "2001-09-10"
}
[25]=>
array(2) {
[0]=>
string(33) "<author>Randall, Cynthia</author>"
[1]=>
string(16) "Randall, Cynthia"
}
[26]=>
array(2) {
[0]=>
string(26) "<title>Lover Birds</title>"
[1]=>
string(11) "Lover Birds"
}
[27]=>
array(2) {
[0]=>
string(22) "<genre>Romance</genre>"
[1]=>
string(7) "Romance"
}
[28]=>
array(2) {
[0]=>
string(19) "<price>4.95</price>"
[1]=>
string(4) "4.95"
}
[29]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-09-02</publish_date>"
[1]=>
string(10) "2000-09-02"
}
[30]=>
array(2) {
[0]=>
string(31) "<author>Thurman, Paula</author>"
[1]=>
string(14) "Thurman, Paula"
}
[31]=>
array(2) {
[0]=>
string(28) "<title>Splish Splash</title>"
[1]=>
string(13) "Splish Splash"
}
[32]=>
array(2) {
[0]=>
string(22) "<genre>Romance</genre>"
[1]=>
string(7) "Romance"
}
[33]=>
array(2) {
[0]=>
string(19) "<price>4.95</price>"
[1]=>
string(4) "4.95"
}
[34]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-11-02</publish_date>"
[1]=>
string(10) "2000-11-02"
}
[35]=>
array(2) {
[0]=>
string(30) "<author>Knorr, Stefan</author>"
[1]=>
string(13) "Knorr, Stefan"
}
[36]=>
array(2) {
[0]=>
string(30) "<title>Creepy Crawlies</title>"
[1]=>
string(15) "Creepy Crawlies"
}
[37]=>
array(2) {
[0]=>
string(21) "<genre>Horror</genre>"
[1]=>
string(6) "Horror"
}
[38]=>
array(2) {
[0]=>
string(19) "<price>4.95</price>"
[1]=>
string(4) "4.95"
}
[39]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-12-06</publish_date>"
[1]=>
string(10) "2000-12-06"
}
[40]=>
array(2) {
[0]=>
string(29) "<author>Kress, Peter</author>"
[1]=>
string(12) "Kress, Peter"
}
[41]=>
array(2) {
[0]=>
string(27) "<title>Paradox Lost</title>"
[1]=>
string(12) "Paradox Lost"
}
[42]=>
array(2) {
[0]=>
string(30) "<genre>Science Fiction</genre>"
[1]=>
string(15) "Science Fiction"
}
[43]=>
array(2) {
[0]=>
string(19) "<price>6.95</price>"
[1]=>
string(4) "6.95"
}
[44]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-11-02</publish_date>"
[1]=>
string(10) "2000-11-02"
}
[45]=>
array(2) {
[0]=>
string(29) "<author>O'Brien, Tim</author>"
[1]=>
string(12) "O'Brien, Tim"
}
[46]=>
array(2) {
[0]=>
string(52) "<title>Microsoft .NET: The Programming Bible</title>"
[1]=>
string(37) "Microsoft .NET: The Programming Bible"
}
[47]=>
array(2) {
[0]=>
string(23) "<genre>Computer</genre>"
[1]=>
string(8) "Computer"
}
[48]=>
array(2) {
[0]=>
string(20) "<price>36.95</price>"
[1]=>
string(5) "36.95"
}
[49]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-12-09</publish_date>"
[1]=>
string(10) "2000-12-09"
}
[50]=>
array(2) {
[0]=>
string(29) "<author>O'Brien, Tim</author>"
[1]=>
string(12) "O'Brien, Tim"
}
[51]=>
array(2) {
[0]=>
string(44) "<title>MSXML3: A Comprehensive Guide</title>"
[1]=>
string(29) "MSXML3: A Comprehensive Guide"
}
[52]=>
array(2) {
[0]=>
string(23) "<genre>Computer</genre>"
[1]=>
string(8) "Computer"
}
[53]=>
array(2) {
[0]=>
string(20) "<price>36.95</price>"
[1]=>
string(5) "36.95"
}
[54]=>
array(2) {
[0]=>
string(39) "<publish_date>2000-12-01</publish_date>"
[1]=>
string(10) "2000-12-01"
}
[55]=>
array(2) {
[0]=>
string(28) "<author>Galos, Mike</author>"
[1]=>
string(11) "Galos, Mike"
}
[56]=>
array(2) {
[0]=>
string(53) "<title>Visual Studio 7: A Comprehensive Guide</title>"
[1]=>
string(38) "Visual Studio 7: A Comprehensive Guide"
}
[57]=>
array(2) {
[0]=>
string(23) "<genre>Computer</genre>"
[1]=>
string(8) "Computer"
}
[58]=>
array(2) {
[0]=>
string(20) "<price>49.95</price>"
[1]=>
string(5) "49.95"
}
[59]=>
array(2) {
[0]=>
string(39) "<publish_date>2001-04-16</publish_date>"
[1]=>
string(10) "2001-04-16"
}
}
`
But the problem because of this regex, I don’t have all the content of the xml file because of their attribute, I think. So, how can I get the other tags in the xml file that are not displayed because of their attribute? What should I change in the regex please?
2
Answers
RegEx can be used to extract data from an XML string but it does not recognize the nodes and the hierarchy. So it is only useful for very specific cases. The RegEx will get complex really fast also.
Use an XML parser for reading or an XSLT processor for transforming. Xpath expressions allow to fetch specific nodes or values.
Here is a basic example using DOM:
"The Right tool for the right job"
is a commonly cited expression – aRegex
to parse XML is not, in my opinion, the"Right Tool!"
The task of presenting the contents of an XML file in table form can best be accomplished with XSL TransformationsGiven the original XML, saved as
catalog.xml
a simple XSL stylesheet can be used to generate the entire HTML Table with the content drawn directly from the XML.catalog.xsl
To use the XSL within PHP:
This yields:
To display the
tag
and the content – again using XSLT thexsl
file needs to be modified slightly. Within the<xsl:for-each select="*">
loop you also want to add the tagName like this perhaps:This modification yields:
To process the XML with DOMDocument only and store the tag&value as a string and the value also as string you could do like this:
Which yields output like this: