skip to Main Content

I would like to rename pdf files I have in an AWS S3 bucket, changing their names from some sort of numbers we use, into UUIDs we have in a PostgreSQL table.

Currently, I’m trying to do it like this:

export PGPASSWORD="<my_password>"

echo "SELECT id, number FROM mytable"                    
  | psql -h <local-docker-ip> -p <docker-port> -U <user> 
  | head -n -2                                            # this is only for eliminating extra text
  | awk 'NR > 2 { aws s3 --profile Admin-Profile mv "s3://<bucket>/documents/" $3 ".pdf" " " "s3://<bucket>/documents/" $1 ".pdf" }'        

The string concatenation I’m using into awk is probably wrong, but I don’t know how to work around it. Also, if anyone knows a better way of doing this, I’m open to suggestions, of course.

2

Answers


  1. The main problem is likely due to how you’re trying to use AWS CLI commands within an awk statement. awk is not a shell and therefore can’t directly execute shell commands.

    A possible solution to this is to use awk to generate a shell script with all the necessary AWS CLI commands, and then execute that script. Here’s how you could modify your code:

    export PGPASSWORD="<my_password>"
    
    echo "SELECT id, number FROM mytable"                   
      | psql -h <local-docker-ip> -p <docker-port> -U <user> 
      | head -n -2                                           
      | awk 'NR > 2 { printf("aws s3 --profile Admin-Profile mv "s3://<bucket>/documents/%s.pdf" "s3://<bucket>/documents/%s.pdf"n", $3, $1) }' 
      > aws_commands.sh
    

    Then you can run this script (make sure it’s executable):

    chmod +x aws_commands.sh
    ./aws_commands.sh
    

    This code constructs a shell script aws_commands.sh with aws s3 mv commands based on the output from your PostgreSQL query, and then executes that script.

    One thing to note is the usage of printf in the awk command. This allows you to format the output as a string, where %s will be replaced by the following arguments in order. Also, remember that shell variables should be referenced with $.

    Please, ensure to check aws_commands.sh for correctness before executing it. If the number of files to be renamed is large, consider testing the script on a small subset of files first.

    Additionally, this approach might have performance issues if you’re dealing with a large number of files because it executes a separate aws s3 mv command for each file. If performance becomes an issue, you might want to look into using the AWS SDK for Python (Boto3) to perform these operations more efficiently.

    Login or Signup to reply.
  2. In awk, there is actually a way to execute shell commands using the system() function or through a print statement piped into system command, which could indeed be used to directly execute AWS CLI commands. However, before we proceed, it’s important to understand that using system() or piping in awk will create a new shell for each command, which can be significantly slower and consume more resources if you’re dealing with a large amount of data. That’s why generating a script and then running it is generally more efficient.
    If you’re still interested in this approach, you can modify your awk command like this:

    awk 'NR > 2 { cmd=sprintf("aws s3 --profile Admin-Profile mv "s3://<bucket>/documents/%s.pdf\" "s3://<bucket>/documents/%s.pdf"", $3, $1); system(cmd) }'
    

    The sprintf function is used to generate the command string, and then the system() function is used to execute it. Please remember that running shell commands directly from awk should be done with caution, since the command will be run as it is, without any checks for errors or unexpected conditions. If there is any risk of SQL injection or other malicious activity in your environment, the first approach (generating a script and then running it) would be safer.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search