I am trying to find XPATH to fetch "td" elements between "h2" tag and "h2" tag or between "h2" tag and closing "table" tag, which ever is immediate.
HTML Code
<html>
<body>
<table>
<tbody>
<tr>
<td colspan="2" class="dc-section" >
<h2>HEADING-1</h2>
</td>
</tr>
<tr>
<td class="dc-table-name" >Country</td>
<td class="dc-table-value" >India</td>
</tr>
<tr>
<td class="dc-table-name" >Country</td>
<td class="dc-table-value" >Nepal</td>
</tr>
<tr>
<td colspan="2" class="dc-section" >
<h2>HEADING-2</h2>
</td>
</tr>
<tr>
<td class="dc-table-name" >Country</td>
<td class="dc-table-value" >USA</td>
</tr>
<tr>
<td class="dc-table-name" >Country</td>
<td class="dc-table-value" >Canada</td>
</tr>
</tbody>
</table>
</body>
</html>
Given HEADING-1, need td elements with value "Country"->"India" and "Country"->"Nepal". Given HEADING-2, need td elements with value "Country"->"USA" and "Country"->"Canada".
Tried using below XPATH, but for the given HEADING-1, it selects all "td" values.
How to frame a common XPATH expression that works to fetch "td" elements for both "HEADING-1" and "HEADING-2"?
XPATH (Not working)
//h2[text()='HEADING-1']/following::td
2
Answers
NB: This is a reply to the original (unedited) question.
Using
xmlstarlet
here but it should be comprehensible (XPath 1.0):
(Edited 2023-04-26: missed the first part of the question)
Output:
which selects
td
elements with atr
parent and noh2
children.Output:
As an alternative, if you want to process the blocks separately, you
could use the
EXSLT
set:leading
function:
Output:
Add a
-C
option before-t
in the last command to get a copy ofthe stylesheet:
Maybe instead of trying to match the following
td
, you check the first precedingtr
with anh2
to see if it matches…