I want to fetch the name of the latest tar file uploaded on PostgreSQL artifactory and want to automate the same process.
I am referring to : https://ftp.postgresql.org/pub/odbc/versions/src/
I am extracting an XML to parse from above URL
XML file looks something like this:
<html>
<head><title>Index of /pub/odbc/versions/src/</title></head>
<body bgcolor="white">
<h1>Index of /pub/odbc/versions/src/</h1><hr><pre><a href="../">../</a>
<a href="psqlodbc-11.00.0000.tar.gz">psqlodbc-11.00.0000.tar.gz</a> 17-Nov-2018 13:50 918461
<a href="psqlodbc-11.01.0000.tar.gz">psqlodbc-11.01.0000.tar.gz</a> 24-May-2019 14:28 919372
<a href="psqlodbc-12.00.0000.tar.gz">psqlodbc-12.00.0000.tar.gz</a> 11-Oct-2019 14:14 920713
<a href="psqlodbc-12.01.0000.tar.gz">psqlodbc-12.01.0000.tar.gz</a> 07-Jan-2020 13:53 932672
<a href="psqlodbc-12.02.0000.tar.gz">psqlodbc-12.02.0000.tar.gz</a> 26-May-2020 13:01 937847
<a href="psqlodbc-13.00.0000.tar.gz">psqlodbc-13.00.0000.tar.gz</a> 19-Nov-2020 09:53 940031
<a href="psqlodbc-13.01.0000.tar.gz">psqlodbc-13.01.0000.tar.gz</a> 02-May-2021 12:27 941064
<a href="psqlodbc-7.2.3.tar.gz">psqlodbc-7.2.3.tar.gz</a> 16-Oct-2002 09:09 367168
<a href="psqlodbc-7.2.4.tar.gz">psqlodbc-7.2.4.tar.gz</a> 12-Nov-2002 08:41 406385
<a href="psqlodbc-7.2.5.tar.gz">psqlodbc-7.2.5.tar.gz</a> 29-Nov-2002 16:10 415885
</pre></hr></body>
</html>
I want to fetch the latest version uploaded on the XML based on the date modified.
I tried
xmllint –xpath "string(//a[last()]/text())" myfile.xml
But it is giving output : psqlodbc-7.2.5.tar.gz (This is not what i want)
I want output to be : psqlodbc-13.01.0000.tar.gz (since it was modified latest on 02-May-2021 12:27)
Found a workaround:
artifactCount=$(xmllint --xpath "count(//a)" psql.xml)
latestModified="20010101"
for (( i=2; i<=${artifactCount}; i++ ))
do
dateModified=$(xmllint --xpath "string(//pre/text()[$i])" psql.xml)
dateModified=$(echo ${dateModified} | awk '{$NF="";sub(/[ t]+$/,"")}1')
dateModified=$(echo ${dateModified} | awk '{$NF="";sub(/[ t]+$/,"")}1')
dateModified=$(date -d "$dateModified" +%Y%m%d)
if [ ${dateModified} -gt ${latestModified} ]
then
latestModified=${dateModified}
j=${i}
fi
done
psqlfile=$(xmllint --xpath "string(//a[${j}]/text())" psql.xml)
echo "Latest file found : ${psqlfile} modified on ${latestModified} "
psqlversion=${psqlfile#"psqlodbc-"}
psqlversion=${psqlversion%".tar.gz"}
2
Answers
Try this:
First, use
sed
format the file’s date like 17-Nov-2018 replace-
to blank for then sort bysort -k
.Then use
sort -k
by the order: year, month, day, time.Last use
grep
andtail
to scratch the last file name.The other way, I found the last column like 941064, is also sort by order, so it works only use this command:
xmllint
can’t open urls and only supports XPath 1.0, last I checked. I’d suggest you give xidel a try.(
x:lines()
is a shorthand fortokenize(...,'rn?|n')
and turns the input into a sequence where every new line is another item)