I have some files in a local folder. These files can be modified locally. I need to keep a copy of the updated files on S3.
Is there a way I can check whether a local file is equal to a file on S3 (example: using checksum)? In this way I won’t have to upload the files that have not been changed.
I am using boto3 and Python.
Question posted in Amazon Web Sevices
The official Amazon Web Services documentation can be found here.
The official Amazon Web Services documentation can be found here.
2
Answers
Apparently, boto3’s s3 client doesn’t return the checksum value without actually downloading the files. One workaround is to use the last modified timestamp: you can use the
LastModified
info that is returned in the list_objects call and compare it to the last modification time of the local file.Since you are using python, one way to get modification time of a file would be:
Amazon S3 objects have an entity tag (ETag) that "represents a specific version of that object". It is a calculated checksum, which you can compare to an equivalently calculated checksum on the local objects.
See: Using Content-MD5 and the ETag to verify uploaded objects
I would suggest first checking the length of the files, because it is very simple and a different length indicates the files are not the same. Then, calculate the ETag of the local object and compare it with the ETag on the S3 object.
However, the ETag is not an MD5 if the object is encrypted, and buckets are now using encryption by default. Therefore, the ETag method might not work on your particular bucket — try some experiments to confirm whether it is applicable. Worst case, you could always calculate an MD5 before upload, and then store the MD5 as Metadata on the object during the upload. You could use this to compare files in future.
S3 can even do this for you. See: New – Additional Checksum Algorithms for Amazon S3 | AWS News Blog