I’m trying to access all the text within the Text node of the following XML document:
<Section>
<Subsection lims:inforce-start-date="2003-07-01" lims:fid="182941" lims:id="182941">
<Label>(2)</Label>
<Text>
In subsection (1),
<DefinedTermEn>beer</DefinedTermEn>
and
<DefinedTermEn>malt liquor</DefinedTermEn>
have the meaning assigned by section 4.
</Text>
</Subsection>
</Section>
With Xpath, using $xml->xpath("Body/Section/Subsection")
will return the following:
object(SimpleXMLElement)#7 (3) {
["Label"]=>
string(3) "(2)"
["Text"]=>
string(64) "In subsection (1), and have the meaning assigned by section 4."
Which makes the inner node disappear. Is there a way to "flatten" all the content of all the subnodes within a node so that I can get a continuous piece of text?
e. g. In subsection (1), beer and malt liquor have the meaning assigned by section 4.
2
Answers
Mixed nodes are to complex for SimpleXML – use DOM. The
DOMNode::$textContent
property will return the text content of any node. For element nodes this includes the text content of any descendant node. AlsoDOMXpath::evaluate()
supports expression that return scalar values. If you cast a node list into a string it will return the text content of the first node in the list.Output:
The answer @ThW posted explains how DOM is a better fit for this, however that approach may leave you with a whitespace problem. You may want to think about writing a function to recurse the node tree within your Text element and build a string that trims the whitespace from each text node, leaving you with a single line.
Output: