Html - xpath to select text preceded by specific element

Rodolfo
April 29, 2024
133 views
0 votes
2 Answers

I’ve got the following html:

<body>
    <h1 id = 'example'>text</h1>
    "My car is a "
    <abbr>
        <a href = 'exampleRef'>
            Ferrari
        </a>
    </abbr>
    "that goes 100 km/h"
</body>

I’m trying to extract the text ‘My car is a Ferrari that goes 100 km/h". The text is not contained in any specific element so I thought of using the following-sibling syntax to extract at least ‘My car is’. I tried with the following expression:

//h1[@id ='example']/following-sibling::text()

and also

//h1[@id ='example']/following-sibling

but got no matches.

Answers

- AliRaza
- April 29, 2024 at 2:55 pm
- 0 votes
0
To extract the entire concatenated text "My car is a Ferrari that goes 100 km/h" from the HTML structure you provided, you’ll need to use XPath to navigate through the elements correctly. Since the desired text spans across multiple text nodes and elements, a straightforward XPath expression to directly extract this concatenated text might not be sufficient due to the structure of the HTML.

Instead, you can use XPath to individually select the relevant text nodes and then concatenate them programmatically. Here’s a step-by-step approach:
1. Identify Relevant Nodes: First, identify the nodes that contain the text parts you want to concatenate:
- The text "My car is a "
- The text "that goes 100 km/h"
- The text "Ferrari" within the <a> tag
1. XPath to Select Specific Nodes:
- To select the <h1> element with id="example":
  //h1[@id='example']
- To select the text within the <a> tag:
```
//h1[@id='example']/following-sibling::abbr/a/text()
```
1. Extract Text Content: Use XPath to extract the text content of these nodes.
2. Concatenate Text: Combine the extracted text content programmatically to form the desired string.
Login or Signup to reply.

- DanielHaley
- April 29, 2024 at 9:25 pm
- 0 votes
0
If you’re able to use XPath 2.0+, you could use string-join() on the following sibling nodes…
```
normalize-space(string-join(//h1[@id='example']/following-sibling::node()))
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – xpath to select text preceded by specific element

Answers