I am writing a backup script in bash and want to delete all backups older than X days, while leaving at least the newest N backups.
This seems so simple, yet I have not been able to find a solution to the whole problem.
In my case, each backup consists of one folder, which are all in the same parent folder.
To decide which backups are the newest the created or modified date should be used. If this gets too complicated, I can also go alphabetically by filename (the folder names consist only of a constant part and a datestamp YYYYMMDD).
I want my script to work for any paths and folder names, so I cannot assume that they do not contain spaces, linebreaks etc..
And it is supposed to work on most modern linux system. Currently I am running it on Ubuntu 22.04.3 LTS.
I had a few different ideas, which all fall short in some way.
Parameters I used
target_basefolder="/path/to/backups/parent/folder"
min_n_backups=3
backup_keepdays=28
Version A – Oneliner
I tried this:
find ${target_basefolder}/* -maxdepth 1 -type d -print0 | sort -rz | xargs -r0 rm -rf
But I do not know how to tell it to ignore the newest three results. I tried to put tail -n +$((min_n_backups+1))
in there, but could not find a way to tell tail
to use NUL as seperator instead of newline (like the -0 option for xargs).
Version B – Two Parts
Count the folders first and then run the delete command only if there are enough newer backups.
n_backups=$(find ${target_basefolder}/* -maxdepth 1 -type d -ctime -$(backup_keepdays) -printf '.' | wc -m)
if (( $n_backups > $min_n_backups )) ; then
find ${target_basefolder}/* -maxdepth 1 -type d -ctime +$((min_n_backups+1)) -print0 | sort -rz | xargs -r0 rm -rf
fi
The problem here is that it will not delete anything if there are not enough newer backups present. For example if min_n_backups is 3 and there are only 2 backups newer than backup_keepdays, it will not delete any of the maybe 100 older backups instead of leaving just 1 of them and deleting the rest.
Version C – Simple Loop
Go through backup folders one by one and check their date.
icount=1
for ifolder in $(find ${target_basefolder}/* -maxdepth 1 -type d -print0 | sort -rz ) ; do
is_old=$(find "$ifolder" -maxdepth 0 -type d -ctime +$((min_n_backups+1)) -printf '.' | wc -m)
if (( $icount > $min_n_backups && $is_old > 0 )) ; then
rm -rf $ifolder
fi
((icount++))
done
I have yet to test this, especially with paths containing spaces and linebreaks. I am not sure if the for loop can handle a NUL-separated list. And if I can give find
a folder name with spaces/linebreaks as input.
But it feels overly complicated anyway, so I hope I won’t have to go there.
Version D – Brute Force
I had the idea to move the newest N backup folders to a different folder, then deleting the folders older than M, then moving the newer N folders back. This could get slow if the folders are big. And it does not feel "right" somehow…
I hope someone can help me solve my Version A or give me hints on the other versions.
Thanks in advance!
2
Answers
You can use GNU
awk
for selecting the directories in the pipe; but before that you have to consider a few problems:$target_basefolder
should be double-quotedyou probably want
find "$target_basefolder"
instead offind "$target_basefolder"/*
The creation time might not be supported by your filesystem, so you better use the timestamps in the names.
Here’s a solution that will work with your
YYYYMMDD
format, as long as the backup dirs end with the timestamp (e.g.backup_12345678_20230831
):