I have the following portion of an HTML file:
<span class="type">Servers:</span>
<li class="server">
<span class="bullet">-</span>
<span class="label"></span> (19345)
<ul class="components">
<span class="type">Components:</span>
<li class="component">
<span class="bullet">+</span>
<span class="label">WebSphere:APPLICATION_SERVER</span> (18234)
<ul class="applications">
<span class="type">Applications:</span>
<li class="application">
<span class="bullet">+</span>
<span class="label">Ipostmutuaintra</span> (981)
<ul class="transactions">
<span class="type">Transactions:</span>
<li class="transaction">
<span class="label">/IpostMutuaIntra</span>
</li>
<li class="transaction">
<span class="label">/IpostMutuaIntra//importo/pdfInoltrata/A_*
I’d want to grab the texts between the tags:
Servers: 19345
Components: WebSphere:APPLICATION_SERVER (18234)
Applications: Ipostmutuaintra
Transactions: /IpostMutuaIntra
/IpostMutuaIntra//importo/pdfInoltrata/A_*
and stored them in an array.
I write the following code that is a function called from another main script, in which I use the XPath method:
<?php
$heading=parseToArray($xpath);
print_r ($heading);
//echo "Script eseguito sul server " . strtoupper($heading[1]) ." relativo alla directory " . substr($heading[4],9,-21);
function parseToArray($xpath)
{
//$xpathquery="//span[@class='".$class1."'][.='ITCAM for AD/WebSphere/J2EE']/following-sibling::ul/li[@class='".$class2."']/span";
$xpathquery="//span[@class='label'][.='Servers:']";
$elements = $xpath->query($xpathquery);
if (!is_null($elements)) {
$resultarray=array();
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
$resultarray[] = $node->nodeValue;
}
}
return $resultarray;
}
}
?>
It grabs only "Servers:", but I am not able to go on.
How to get the other text as I want?
2
Answers
If you start with the
p
that surrounds the Servers:span
it becomes easier to get the siblings like so:This code allows me to get all the text you want but it still requires parsing the individual bits out since the input is rather unstructured.
It looks like you want from all
<span class="type">
the available label’s.I would first need use this xpath:
this will give you every
ul
with childspan
that has a@class
attribute with the valuetype
and ha aancestor-or-self::ul
that has a ul with thetext()
with the valueServers:
This xpath can be used in your code like this:
With this result: