I would like to rename pdf files I have in an AWS S3 bucket, changing their names from some sort of numbers we use, into UUIDs we have in a PostgreSQL table.
Currently, I’m trying to do it like this:
export PGPASSWORD="<my_password>"
echo "SELECT id, number FROM mytable"
| psql -h <local-docker-ip> -p <docker-port> -U <user>
| head -n -2 # this is only for eliminating extra text
| awk 'NR > 2 { aws s3 --profile Admin-Profile mv "s3://<bucket>/documents/" $3 ".pdf" " " "s3://<bucket>/documents/" $1 ".pdf" }'
The string concatenation I’m using into awk
is probably wrong, but I don’t know how to work around it. Also, if anyone knows a better way of doing this, I’m open to suggestions, of course.
2
Answers
The main problem is likely due to how you’re trying to use AWS CLI commands within an
awk
statement.awk
is not a shell and therefore can’t directly execute shell commands.A possible solution to this is to use
awk
to generate a shell script with all the necessary AWS CLI commands, and then execute that script. Here’s how you could modify your code:Then you can run this script (make sure it’s executable):
This code constructs a shell script
aws_commands.sh
withaws s3 mv
commands based on the output from your PostgreSQL query, and then executes that script.One thing to note is the usage of
printf
in theawk
command. This allows you to format the output as a string, where%s
will be replaced by the following arguments in order. Also, remember that shell variables should be referenced with$
.Please, ensure to check
aws_commands.sh
for correctness before executing it. If the number of files to be renamed is large, consider testing the script on a small subset of files first.Additionally, this approach might have performance issues if you’re dealing with a large number of files because it executes a separate
aws s3 mv
command for each file. If performance becomes an issue, you might want to look into using the AWS SDK for Python (Boto3) to perform these operations more efficiently.In
awk
, there is actually a way to execute shell commands using thesystem()
function or through a print statement piped into system command, which could indeed be used to directly execute AWS CLI commands. However, before we proceed, it’s important to understand that usingsystem()
or piping in awk will create a new shell for each command, which can be significantly slower and consume more resources if you’re dealing with a large amount of data. That’s why generating a script and then running it is generally more efficient.If you’re still interested in this approach, you can modify your
awk
command like this:The
sprintf
function is used to generate the command string, and then thesystem()
function is used to execute it. Please remember that running shell commands directly fromawk
should be done with caution, since the command will be run as it is, without any checks for errors or unexpected conditions. If there is any risk of SQL injection or other malicious activity in your environment, the first approach (generating a script and then running it) would be safer.