skip to Main Content

I have some annoying XML from an API response that looks like:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><report><QueryResult>
  <ResumptionToken>123456</ResumptionToken>
  <IsFinished>true</IsFinished>
  <ResultXml>
    <rowset xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
      <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
        <xsd:complexType name="Row">
          <xsd:sequence>
            <xsd:element maxOccurs="1" minOccurs="1" name="Column0" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="0" saw-sql:displayFormula="0" saw-sql:length="4" saw-sql:precision="12" saw-sql:scale="0" saw-sql:tableHeading="" saw-sql:type="integer" type="xsd:int"/>
            <xsd:element maxOccurs="1" minOccurs="0" name="Column1" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISBN" saw-sql:displayFormula=""Bibliographic Details"."ISBN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
            <xsd:element maxOccurs="1" minOccurs="0" name="Column2" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISSN" saw-sql:displayFormula=""Bibliographic Details"."ISSN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
            <xsd:element maxOccurs="1" minOccurs="0" name="Column3" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="Publication Date" saw-sql:displayFormula=""Bibliographic Details"."Publication Date"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
          </xsd:sequence>
        </xsd:complexType>
      </xsd:schema>
      <Row>
        <Column0>0</Column0>
        <Column1>55555555 444444445</Column1>
        <Column3>[2019]</Column3>

      </Row>
      <Row>
        <Column0>0</Column0>
        <Column1>555555555</Column1>
        <Column3>©2009.</Column3>
      </Row>

I’m using PHP’s SimpleXML to parse this data, but am struggling to access the column headers located in the non-default namespace under xsd:element. For example, I need to access the value: saw-sql:columnHeading="Publication Date", as this column can be dynamic and isn’t always "Publication Date". So I’m looking to pluck out the values for saw-sql[@columnHeading].

I’ve tried all manners of registering the namespaces with Xpath, using attributes() etc etc. The closest I got was:

$ResponseXml->registerXPathNamespace('xsd','http://www.w3.org/2001/XMLSchema');
$elements = $ResponseXml->xpath('//xsd:element[@minOccurs]');

This actually got me the default namespace attributes, but I need the ones for saw-sql, and the same method of:

$ResponseXml->registerXPathNamespace('saw-sql','urn:saw-sql');
$elements = $ResponseXml->xpath('//saw-sql:element[@columnHeading]');

does not get me any results.

3

Answers


  1. Your XPath //saw-sql:element[@columnHeading] is looking for elements named element (in the saw-sql namespace), which have attributes named columnHeading (in no namespace), but the element names are actually in the xsd namespace, while the attributes are in the saw-sql namespace.

    So I believe what you want is:

    $ResponseXml->registerXPathNamespace('xsd','http://www.w3.org/2001/XMLSchema');
    $ResponseXml->registerXPathNamespace('saw-sql','urn:saw-sql');
    $elements = $ResponseXml->xpath('//xsd:element[@saw-sql:columnHeading]');
    
    Login or Signup to reply.
  2. fwiw you could use DOMDocument to parse it instead of SimpleXML, for example

    <?php
    
    $xml = <<<'XML'
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><report><QueryResult>
      <ResumptionToken>123456</ResumptionToken>
      <IsFinished>true</IsFinished>
      <ResultXml>
        <rowset xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
          <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:saw-sql="urn:saw-sql" targetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset">
            <xsd:complexType name="Row">
              <xsd:sequence>
                <xsd:element maxOccurs="1" minOccurs="1" name="Column0" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="0" saw-sql:displayFormula="0" saw-sql:length="4" saw-sql:precision="12" saw-sql:scale="0" saw-sql:tableHeading="" saw-sql:type="integer" type="xsd:int"/>
                <xsd:element maxOccurs="1" minOccurs="0" name="Column1" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISBN" saw-sql:displayFormula=""Bibliographic Details"."ISBN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
                <xsd:element maxOccurs="1" minOccurs="0" name="Column2" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="ISSN" saw-sql:displayFormula=""Bibliographic Details"."ISSN"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
                <xsd:element maxOccurs="1" minOccurs="0" name="Column3" saw-sql:aggregationRule="none" saw-sql:aggregationType="nonAgg" saw-sql:columnHeading="Publication Date" saw-sql:displayFormula=""Bibliographic Details"."Publication Date"" saw-sql:length="255" saw-sql:precision="255" saw-sql:scale="0" saw-sql:tableHeading="Bibliographic Details" saw-sql:type="varchar" type="xsd:string"/>
              </xsd:sequence>
            </xsd:complexType>
          </xsd:schema>
          <Row>
            <Column0>0</Column0>
            <Column1>55555555 444444445</Column1>
            <Column3>[2019]</Column3>
    
          </Row>
          <Row>
            <Column0>0</Column0>
            <Column1>555555555</Column1>
            <Column3>©2009.</Column3>
          </Row>
    XML;
    $domd = new DOMDocument();
    @$domd->loadHTML($xml);
    foreach ($domd->getElementsByTagName(strtolower("ResultXml")) as $resultXml) {
        $keyNames1 = [];
        $keyNames2 = [];
        foreach ($resultXml->getElementsByTagName("rowset") as $rowset) {
            foreach ($rowset->getElementsByTagName("sequence") as $sequence) {
                foreach ($sequence->getElementsByTagName("element") as $element) {
                    $keyNames1[] = $element->getAttribute("name");
                    $keyNames2[] = $element->getAttribute(strtolower("saw-sql:columnHeading"));
                }
            }
        }
        $rows = [];
        foreach ($resultXml->getElementsByTagName("row") as $row) {
            $rowData = [];
            foreach ($keyNames1 as $keyName1Key => $keyName1Name) {
                $tmp = $row->getElementsByTagName(strtolower($keyName1Name));
                if ($tmp->length) {
                    $rowData[$keyNames2[$keyName1Key]] = $tmp->item(0)->textContent;
                }
            }
            $rows[] = $rowData;
        }
        var_export($rows);
    }
    
    

    yields

    array (
      0 =>
      array (
        0 => '0',
        'ISBN' => '55555555 444444445',
        'Publication Date' => '[2019]',
      ),
      1 =>
      array (
        0 => '0',
        'ISBN' => '555555555',
        'Publication Date' => '©2009.',
      ),
    )
    
    • i used loadHTML instead of loadXML because DOMDocument insists that your XML is not valid XML; in loadHTML() mode, everything is lowercase for some reason, in loadXML() everything is case-sensitive.
    Login or Signup to reply.
  3. SimpleXMLElement::attributes() allows you to access the attributes of a specific namespace providing the namespace URI as a parameter.

    $value = $simpleXMLElement->attributes($namespaceURI);
    

    But first I would suggest defining a constant (or variable) for the namespaces that you are using. This will make your code a lot more readable and avoid typos.

    Be aware that "rowset" redefines the default namespace for itself and the descendant element nodes, they are not in the "empty/none" namespace.

    // define a dictionary for the namespaces
    const XMLNS = [
        // same alias as in the document
        'xsd' => 'http://www.w3.org/2001/XMLSchema',
        // let's use a shorter alias
        'saw' => 'urn:saw-sql',
        // own alias - used without alias in the document
        'rowset' => 'urn:schemas-microsoft-com:xml-analysis:rowset'
    ];
    
    $report = new SimpleXMLElement(getXMLString());
    foreach (XMLNS as $alias => $uri) {
        $report->registerXpathNamespace($alias, $uri);
    }
    
    $columns = [];
    foreach($report->xpath('//xsd:complexType[@name="Row"]/xsd:sequence/xsd:element') as $element) {
        $columns[] = [
            // read the attribute (fallback to '' if missing), cast to string
            'name'=> (string)($element['name'] ?? ''),
            // read attribute with namespace
            'heading'=> (string)($element->attributes(XMLNS['saw'])['columnHeading'] ?? '')
        ];
    }
    
    var_dump($columns);
    

    Output:

    array(4) {
      [0]=>
      array(2) {
        ["name"]=>
        string(7) "Column0"
        ["heading"]=>
        string(1) "0"
      }
      [1]=>
      array(2) {
        ["name"]=>
        string(7) "Column1"
        ["heading"]=>
        string(4) "ISBN"
      }
      [2]=>
      array(2) {
        ["name"]=>
        string(7) "Column2"
        ["heading"]=>
        string(4) "ISSN"
      }
      [3]=>
      array(2) {
        ["name"]=>
        string(7) "Column3"
        ["heading"]=>
        string(16) "Publication Date"
      }
    }
    

    The Xpath Expression

    • Fetch the type definitions: //xsd:complexType
    • Filter for "Row": //xsd:complexType[@name="Row"]
    • The elements inside the sequence:
      //xsd:complexType[@name="Row"]/xsd:sequence/xsd:element

    The part in [] are conditions for nodes returned the previous location path. So //foo[@bar] would return the foo element nodes with a bar attribute, while //foo/@bar would return the bar attributes of all foo element nodes.

    DOM

    This solution would not look much different with DOM. The Xpath processor is a separate object and here are specific methods to work with namespaces (suffix "NS"). DOM is more specific and powerful then SimpleXML.

    $document = new DOMDocument();
    $document->loadXML(getXMLString());
    $xpath = new DOMXpath($document);
    foreach (XMLNS as $alias => $uri) {
        $xpath->registerNamespace($alias, $uri);
    }
    
    $columns = [];
    foreach($xpath->evaluate('//xsd:complexType[@name="Row"]/xsd:sequence/xsd:element') as $element) {
        $columns[] = [
            'name'=> $element->getAttribute('name'),
            'heading'=> $element->getAttributeNS(XMLNS['saw'], 'columnHeading')
        ];
    } 
    
    var_dump($columns);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search