I’ve got the following html:
<body>
<h1 id = 'example'>text</h1>
"My car is a "
<abbr>
<a href = 'exampleRef'>
Ferrari
</a>
</abbr>
"that goes 100 km/h"
</body>
I’m trying to extract the text ‘My car is a Ferrari that goes 100 km/h". The text is not contained in any specific element so I thought of using the following-sibling
syntax to extract at least ‘My car is’. I tried with the following expression:
//h1[@id ='example']/following-sibling::text()
and also
//h1[@id ='example']/following-sibling
but got no matches.
2
Answers
To extract the entire concatenated text "My car is a Ferrari that goes 100 km/h" from the HTML structure you provided, you’ll need to use XPath to navigate through the elements correctly. Since the desired text spans across multiple text nodes and elements, a straightforward XPath expression to directly extract this concatenated text might not be sufficient due to the structure of the HTML.
Instead, you can use XPath to individually select the relevant text nodes and then concatenate them programmatically. Here’s a step-by-step approach:
<a>
tag<h1>
element with id="example"://h1[@id='example']
<a>
tag:If you’re able to use XPath 2.0+, you could use string-join() on the following sibling nodes…