Context: I have 2 buckets, bucket A and bucket B. Bucket A had all of its contents placed in Bucket B via the aws s3 sync
CLI command.
Problem: I want to delete all the items in bucket B that also exist in Bucket A, without deleting anything in Bucket A.
E.g.
Bucket A (Source):
- File R
- File G
- File C
Bucket B (Destination):
- File A
- File R
- File G
- File C
- File O
^^ I need to delete all files in the target destination which do exist in the source destination, so only files R, G, and C need to be deleted from Bucket B.
Attempted Solution: The aws s3 sync
CLI command includes the flag --delete
. However, this flag only ensures that any files in the target destination that aren’t in the source destination are deleted.
Is there any way to do this using aws s3 sync
?
2
Answers
I ended up solving this via s3 bucket lifecycle rules where I specified a regex pattern that matched against the necessary files in both buckets.
Using S3 lifecycle rules allows any number of files (in the millions even) to be deleted at midnight UTC and does not incur a cost, unlike the
aws s3 cli
which needs to list objects in order to programmatically delete them (in batches of 1000 at a time)For a one-off cleanup, if the number of files to delete is small, use the sequence of shell-commands below. Run it with the
--dryrun
flag first, and then, if the output looks as expected, without the flag.Explanation of each command is below.
Alternatively, if the number of files is large, use the the batch delele command below. Please note that the batch command does not have the
--dryrun
option.Explanation of the command is below.