skip to Main Content

This is a follow-on to my previous question.

I can’t quite work out the XSLT to do the following. I have some HTML with one or more <ul> tags. The <li> tags may contain <a> tags. I want to remove any <li> tag if it contains an anchor where the href meets a certain pattern.

Example:

<ul>
  <li><a href="/some/old/path">One</a></li>
  <li><a href="/other/old/path">Two</a></li>
  <li><a href="/some/older/path">Three</a></li>
  <li><a href="/other/older/path">Four</a></li>
</ul>

I wish to remove the <li> lines where the href contains older so the result would be:

<ul>
  <li><a href="/some/old/path">One</a></li>
  <li><a href="/other/old/path">Two</a></li>
</ul>

The lines I wish to remove could be in any order and scattered across multiple <ul> tags. I’m fine if I end up with an empty <ul></ul> pair (but bonus points if such a resulting empty list can be removed easily). <li> tags that do not contain an anchor or that contain a non-matching anchor should be left as-is.

I got close with the following:

<xsl:template match="li/a[contains(@href, 'older')]">
</xsl:template>

but this leaves the opening <li>:

<ul>
  <li><a href="/some/old/path">One</a></li>
  <li><a href="/other/old/path">Two</a></li>
  <li>
  <li>
</ul>

How do I get rid of the whole <li> line?

Here’s the full HTML I’m working with:

<html>
<head>
<!-- lots of stuff I don't care about -->
</head>
<body>
<div>
  <!-- lots of stuff I don't care about -->
  <div>
     <!-- lots of stuff I don't care about -->
     <div id="key_div">
         <div id="ignore_this">
           <!-- lots of stuff I don't care about -->
         </div>
         <p>More junk I don't want</p>
         <p>Even more junk I don't want</p>
         <h2><span class="someClass" id="someID">Header</span></h2>
         <p>Stuff I want to keep</p>
         <!-- A lot of stuff I want to keep -->
         <p>More stuff I want to keep</p>
         <ul>
           <li><a href="/some/old/path">One</a></li>
           <li><a href="/some/old/other">Two</a></li>
           <li><a href="/some/older/path">Three</a></li>
           <li><a href="/some/older/other">Four</a></li>
         </ul>
         <ul>
           <li>Leave this as-is</li>
         </ul>
     </div>
     <!-- lots of stuff I don't care about -->
  </div>
  <!-- lots of stuff I don't care about -->
</div>
</body>
</html>

And here’s the XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" indent="yes" encoding="utf-8"/>

    <xsl:template match="/html">
        <html>
            <head>
                <title></title>
            </head>
            <body>
                <xsl:apply-templates select="//div[@id='key_div']/h2"/>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="h2">
        <h1>
            <xsl:value-of select="." />
        </h1>
        <xsl:apply-templates select="following-sibling::*"/>
    </xsl:template>

    <!-- My failed attempt to remove certain li lines -->
    <xsl:template match="li/a[contains(@href, 'older')]">
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

My current result:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title></title>
</head>
<body>
<h1>Header</h1>
<p>Stuff I want to keep</p>
<p>More stuff I want to keep</p>
<ul>
           <li><a href="/some/old/path">One</a></li>
           <li><a href="/some/old/other">Two</a></li>
           <li>
           <li>
         </ul>
<ul>
           <li>Leave this as-is</li>
         </ul>
</body>
</html>

I just need to figure out how to remove the full <li> line for the matching hrefs.

3

Answers


  1. Chosen as BEST ANSWER

    The answer by @DanielHaley worked for the specific example I posted. It turns out my real needs were just slightly more complicated and his answer resulted in more <li> tags being filtered than expected when I updated the condition.

    In my slightly more complicated case, I can actually have anchors such as:

    <ul>
      <li><a href="/some/path/old">One</a></li>
      <li><a href="/other/path/older">Two</a></li>
      <li><a href="/different/path/young">Two</a></li>
    </ul>
    

    and I only want to keep the anchors with "old". And the version of XSLT I have (macOS 14) doesn't support the ends-with function. If it did I could use:

    match="li[not(ends-with(a/@href, 'old'))]"
    

    Due to the lack of ends-with I need a match like:

    match="li[not(contains(a/@href, 'old')) or contains(a/@href, 'older')]
    

    But this also filters out any <li> tag that doesn't contain an anchor.

    The following change worked for my full case:

    <xsl:template match="li[a[not(contains(@href, 'old')) or contains(@href, 'older')]]" />
    

    Now this only filters <li> tags that have an <a> tag with an href value that doesn't end with "old".


  2. With:

    match="li/a[contains(@href, 'older')]"
    

    you’re selecting the a element.

    Try changing it to:

    match="li[contains(a/@href, 'older')]"
    

    (Untested and honestly I didn’t even look at your full XSLT.)

    Login or Signup to reply.
  3. the version of XSLT I have (macOS 14) doesn’t support the ends-with
    function. If it did I could use:

    match="li[ends-with(a/@href, 'old')]"

    Here is one possible implementation of ends-with() in XSLT 1.0:

    <xsl:template match="li[contains(concat(a/@href, '&#133;'), 'old&#133;')]"/>
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search