I have an S3 bucket which does not have versioning or lifecycle rules set due to a client decision and it contains data from as old as 10 years. However, we also want to keep a backup of the files that has been worked on in the last 30 days.
I am planning to create a new S3, turn on versioning and set a lifecycle rule to delete files older than 30 days. After that I will run a cronjob to do the aws s3 sync
from the source S3 to the destination S3.
So, the files who are older than 30 days will get deleted from the destination s3. Which is fine. However, my concern is while doing the aws s3 sync
command after that, it will restore the old files to the destination which were deleted. Is that correct? If so, how to resolve this and only keep the files for 30 days only?
2
Answers
This isn’t possible with s3 sync, but it would be a perfect use case for S3 bucket Event notifications
https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html
The idea would be to have an event notification on new file creation or file modification on bucket A. Then, have this event notification trigger a lambda function which copies the file from bucket A to bucket B.
You could build your own replication.
This way, you will not require Versioning and you can also add logic to only copy objects in particular paths or with particular extensions (eg just .csv files).
The code is very simple, see: AWS-Lambda function (Python) to copy file from S3 – perform manipulation – store output in another S3 – Stack Overflow