skip to Main Content

I have the following portion of an HTML file:

<span class="type">Servers:</span>
    <li class="server">
        <span class="bullet">-</span>
        <span class="label"></span> (19345)
        <ul class="components">
            <span class="type">Components:</span>
            <li class="component">
            <span class="bullet">+</span>
            <span class="label">WebSphere:APPLICATION_SERVER</span> (18234)
            <ul class="applications">
                <span class="type">Applications:</span>
                    <li class="application">
                        <span class="bullet">+</span>
                        <span class="label">Ipostmutuaintra</span> (981)
                            <ul class="transactions">
                                <span class="type">Transactions:</span>
                                <li class="transaction">
                                    <span class="label">/IpostMutuaIntra</span>
                                </li>
                                <li class="transaction">
                                    <span class="label">/IpostMutuaIntra//importo/pdfInoltrata/A_*

I’d want to grab the texts between the tags:

Servers: 19345
Components: WebSphere:APPLICATION_SERVER (18234)
Applications: Ipostmutuaintra
Transactions: /IpostMutuaIntra
              /IpostMutuaIntra//importo/pdfInoltrata/A_*

and stored them in an array.
I write the following code that is a function called from another main script, in which I use the XPath method:

<?php

$heading=parseToArray($xpath);
    
print_r ($heading);
//echo "Script eseguito sul server " . strtoupper($heading[1]) ." relativo alla directory " . substr($heading[4],9,-21);
    
function parseToArray($xpath)
{

    //$xpathquery="//span[@class='".$class1."'][.='ITCAM for AD/WebSphere/J2EE']/following-sibling::ul/li[@class='".$class2."']/span";
    $xpathquery="//span[@class='label'][.='Servers:']";
    $elements = $xpath->query($xpathquery);
    
    if (!is_null($elements)) {
        $resultarray=array();
        foreach ($elements as $element) {
            $nodes = $element->childNodes;
            foreach ($nodes as $node) {
                $resultarray[] = $node->nodeValue;
            }
        }
        return $resultarray;
    }
}
?>

It grabs only "Servers:", but I am not able to go on.
How to get the other text as I want?

2

Answers


  1. If you start with the p that surrounds the Servers: span it becomes easier to get the siblings like so:

    $xpathquery="//p[./span[@class='type'][.='Servers:']]/following-sibling::ul/li";
    

    This code allows me to get all the text you want but it still requires parsing the individual bits out since the input is rather unstructured.

    Login or Signup to reply.
  2. It looks like you want from all <span class="type"> the available label’s.

    I would first need use this xpath:

    //ul[ ancestor-or-self::ul[span[@class='type' and text()='Servers:']]][span[@class='type']]
    

    this will give you every ul with child span that has a @class attribute with the value type and ha a ancestor-or-self::ul that has a ul with the text() with the value Servers:

    This xpath can be used in your code like this:

    $xpath   = new DOMXPath($dom);
    $heading = parseToArray($xpath);
    
    print_r($heading);
    
    function parseToArray($xpath)
    {
        $xpathQueryUls = '//ul[ancestor-or-self::ul[span[@class="type" and text()="Servers:"]]][span[@class="type"]]';
        $ulElements    = $xpath->query($xpathQueryUls);
    
        if (!is_null($ulElements)) {
            $resultArray = array();
            foreach ($ulElements as $ulElement) {
                $xpathQueryType       = 'span[@class="type"]';
                $span                 = $xpath->query($xpathQueryType, $ulElement)[0]->nodeValue;
                $xpathQueryLabelNodes = 'li/span[@class="label"][text()]/text()|li/span[@class="label"]/following-sibling::node()[1][self::text()][normalize-space()]';
                /*
                 * $xpathQueryLabelNodes has two parts:
                 * 1. li/span[@class='label'][text()]/text()
                 *      gets the text()-content of the label-span (if there is text)
                 * 2. li/span[@class='label']/following-sibling::node()[1][self::text()][normalize-space()]
                 *      gets the text()-content of the first following text-node of the label-span (if there is text-content)
                 */
                $labels = $xpath->query($xpathQueryLabelNodes, $ulElement);
                foreach ($labels as $label) {
                    $resultArray[$span][] = $label->nodeValue;
                }
            }
            return $resultArray;
        }
    }
    
    ?>
    

    With this result:

    Array
    (
        [Servers:] => Array
            (
                [0] =>  (19345)
            
            )
    
        [Components:] => Array
            (
                [0] => WebSphere:APPLICATION_SERVER
                [1] =>  (18234)
                
            )
    
        [Applications:] => Array
            (
                [0] => Ipostmutuaintra
                [1] =>  (981)
                    
            )
    
        [Transactions:] => Array
            (
                [0] => /IpostMutuaIntra
                [1] => /IpostMutuaIntra//importo/pdfInoltrata/A_*
            )
    
    )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search