I’m trying to get the contents of the paragraph of the following html:
<h4 class="m-b-0 text-dark synopsis"><p>This is the text I want.</p></h4>
There’s several h4
s, but only one with the class synopsis
.
I’m able to get the h4 element with print_r($xpath->query("//h4[contains(@class, 'synopsis')]"));
but I’m unable to get the child paragraph contents.
What am I doing wrong?
2
Answers
If
selects the desired
h4
elements, thenwill select the children
p
elements of the desiredh4
elements, andwill select the text node children of those
p
elements.You can obtain the string value of a node via
string()
:Note that the above assumes XPath 1.0 (or that there will only be one such
p
), where the string value of the first node of the node set selected by//h4/p
will be returned. Passing a sequence of nodes tostring()
would be an error in XPath 2.0 and higher, where instead you should use:if there could be more than one such
p
, orif you’d like the string values of all such
p
elements returned.Example HTML
XPath selection example
h4
can not containp
. PHPs DOMDocument will try to fix the HTML:This can be mostly avoided with some loading flags:
The class attribute value consists of tokens separated by whitespace. A simple
contains()
will match the string if it is part of another class name.To match them with Xpath 1.0, use
normalize-space()
andconcat()
. The idea is to convert the attribute value to{space}classOne{space}classTwo{space}
and match them against{space}classOne{space}
.normalize-space(@class)
.concat(' ', normalize-space(@class), ' ')
[contains(concat(' ', normalize-space(@class), ' '), ' synopsis ')]
//*[contains(concat(' ', normalize-space(@class), ' '), ' synopsis ')]
string(//*[contains(concat(' ', normalize-space(@class), ' '), ' synopsis ')])
Output:
If you try to fetch multiple nodes, remove the string cast in Xpath. The expression will return a node list. Iterate the nodes and read the
$textContent
property. It will contain the contents of all descendant text nodes.