skip to Main Content

I want to fetch the name of the latest tar file uploaded on PostgreSQL artifactory and want to automate the same process.

I am referring to : https://ftp.postgresql.org/pub/odbc/versions/src/

I am extracting an XML to parse from above URL

XML file looks something like this:

<html>
<head><title>Index of /pub/odbc/versions/src/</title></head>
<body bgcolor="white">
<h1>Index of /pub/odbc/versions/src/</h1><hr><pre><a href="../">../</a>
    <a href="psqlodbc-11.00.0000.tar.gz">psqlodbc-11.00.0000.tar.gz</a>                         17-Nov-2018 13:50              918461
    <a href="psqlodbc-11.01.0000.tar.gz">psqlodbc-11.01.0000.tar.gz</a>                         24-May-2019 14:28              919372
    <a href="psqlodbc-12.00.0000.tar.gz">psqlodbc-12.00.0000.tar.gz</a>                         11-Oct-2019 14:14              920713
    <a href="psqlodbc-12.01.0000.tar.gz">psqlodbc-12.01.0000.tar.gz</a>                         07-Jan-2020 13:53              932672
    <a href="psqlodbc-12.02.0000.tar.gz">psqlodbc-12.02.0000.tar.gz</a>                         26-May-2020 13:01              937847
    <a href="psqlodbc-13.00.0000.tar.gz">psqlodbc-13.00.0000.tar.gz</a>                         19-Nov-2020 09:53              940031
    <a href="psqlodbc-13.01.0000.tar.gz">psqlodbc-13.01.0000.tar.gz</a>                         02-May-2021 12:27              941064
    <a href="psqlodbc-7.2.3.tar.gz">psqlodbc-7.2.3.tar.gz</a>                                   16-Oct-2002 09:09              367168
    <a href="psqlodbc-7.2.4.tar.gz">psqlodbc-7.2.4.tar.gz</a>                                   12-Nov-2002 08:41              406385
    <a href="psqlodbc-7.2.5.tar.gz">psqlodbc-7.2.5.tar.gz</a>                                   29-Nov-2002 16:10              415885
</pre></hr></body>
</html>

I want to fetch the latest version uploaded on the XML based on the date modified.

I tried

xmllint –xpath "string(//a[last()]/text())" myfile.xml

But it is giving output : psqlodbc-7.2.5.tar.gz (This is not what i want)

I want output to be : psqlodbc-13.01.0000.tar.gz (since it was modified latest on 02-May-2021 12:27)

Found a workaround:

artifactCount=$(xmllint --xpath "count(//a)" psql.xml)

latestModified="20010101"

for (( i=2; i<=${artifactCount}; i++ ))
do
  dateModified=$(xmllint --xpath "string(//pre/text()[$i])" psql.xml)
  dateModified=$(echo ${dateModified} | awk '{$NF="";sub(/[ t]+$/,"")}1')
  dateModified=$(echo ${dateModified} | awk '{$NF="";sub(/[ t]+$/,"")}1')
  dateModified=$(date -d "$dateModified" +%Y%m%d)
  
  if [ ${dateModified} -gt ${latestModified} ]
    then
        latestModified=${dateModified}
        j=${i}
    fi 
done

psqlfile=$(xmllint --xpath "string(//a[${j}]/text())" psql.xml)

echo "Latest file found : ${psqlfile} modified on ${latestModified} "

psqlversion=${psqlfile#"psqlodbc-"}
psqlversion=${psqlversion%".tar.gz"}

2

Answers


  1. Try this:

    sed 's/([0-9]{2})-([a-zA-Z]{3})-([0-9]{4})/1 2 3/' myfile.xml | sort -k5,5 -k4,4M -k3,3 -k6,6 | grep -oP '(?<=">).*(?=<)' | tail -1
    

    First, use sed format the file’s date like 17-Nov-2018 replace - to blank for then sort by sort -k.

    Then use sort -k by the order: year, month, day, time.

    Last use grep and tail to scratch the last file name.

    The other way, I found the last column like 941064, is also sort by order, so it works only use this command:

    sort -k5,5 myfile.xml | grep -oP '(?<=">).*(?=<)' | tail -1
    
    Login or Signup to reply.
  2. xmllint can’t open urls and only supports XPath 1.0, last I checked. I’d suggest you give a try.

    $ xidel -s https://ftp.postgresql.org/pub/odbc/versions/src/ -e 'x:lines(//pre)'
    ../
    psqlodbc-07.03.0100.tar.gz                        15-May-2003 15:56             446075
    psqlodbc-07.03.0200.tar.gz                        22-Oct-2003 13:46             451263
    [...]
    psqlodbc-7.2.5.tar.gz                             29-Nov-2002 16:10             415885
    
    $ xidel -s https://ftp.postgresql.org/pub/odbc/versions/src/ -e '
      x:lines(//pre)[position() gt 1]
    '
    psqlodbc-07.03.0100.tar.gz                        15-May-2003 15:56             446075
    psqlodbc-07.03.0200.tar.gz                        22-Oct-2003 13:46             451263
    psqlodbc-08.00.0100.tar.gz                        02-Mar-2005 14:35             586241
    [...]
    psqlodbc-7.2.5.tar.gz                             29-Nov-2002 16:10             415885
    

    (x:lines() is a shorthand for tokenize(...,'rn?|n') and turns the input into a sequence where every new line is another item)

    $ xidel -s https://ftp.postgresql.org/pub/odbc/versions/src/ -e '
      x:lines(//pre)[last()] ! tokenize(.,"s{2,}")
    '
    psqlodbc-7.2.5.tar.gz
    29-Nov-2002 16:10
    415885
    
    $ xidel -s https://ftp.postgresql.org/pub/odbc/versions/src/ -e '
      x:lines(//pre)[last()] ! parse-ietf-date(tokenize(.,"s{2,}")[2])
    '
    2002-11-29T16:10:00Z
    
    $ xidel -s https://ftp.postgresql.org/pub/odbc/versions/src/ -e '
      for $release in x:lines(//pre)[position() gt 1]
      order by parse-ietf-date(tokenize($release,"s{2,}")[2])
      return $release
    '
    psqlodbc-7.2.3.tar.gz                             16-Oct-2002 09:09             367168
    psqlodbc-7.2.4.tar.gz                             12-Nov-2002 08:41             406385
    psqlodbc-7.2.5.tar.gz                             29-Nov-2002 16:10             415885
    [...]
    psqlodbc-13.01.0000.tar.gz                        02-May-2021 12:27             941064
    
    $ xidel -s https://ftp.postgresql.org/pub/odbc/versions/src/ -e '
      (
        for $release in x:lines(//pre)[position() gt 1]
        order by parse-ietf-date(tokenize($release,"s{2,}")[2])
        return $release
      )[last()]
    '
    psqlodbc-13.01.0000.tar.gz                        02-May-2021 12:27             941064
    
    $ xidel -s https://ftp.postgresql.org/pub/odbc/versions/src/ -e '
      resolve-uri(
        (
          for $release in x:lines(//pre)[position() gt 1]
          let $item:=tokenize($release,"s{2,}")
          order by parse-ietf-date($item[2])
          return $item[1]
        )[last()]
      )
    '
    https://ftp.postgresql.org/pub/odbc/versions/src/psqlodbc-13.01.0000.tar.gz
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search