skip to Main Content

I have an ec2 instance where a approx. 400 GB file (tar.gz) is stored.

I now want to unzip and upload that file and store it inside a s3 bucket which is in the same aws account.

Regarding the normal aws s3 cp command i always ran into timeouts.

What is the easiest way to accomplish that task

2

Answers


  1. I would recommend using s3 sync instead of s3 cp.

    aws s3 --region us-east-1 sync [folder] s3://[bucketname]
    
    Login or Signup to reply.
  2. They sync should possibly work. In general, the underlying mechanics is probably built on multi-part upload. That may be useful to know if you want to implement it yourself. It can be done even from command line.

    https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-upload-object.html

    The process may look intimidating, but it is not that bad and the benefit is that you can upload a large file even over a bad line, because you can re-try individual parts. You can also finish the upload later – for example after your temporary permissions do expire or the next day. Or possibly you can upload from two locations as long as the file is split the exact same way.

    Initiate the upload:

    aws s3api create-multipart-upload --bucket my-bucket --key 'multipart/01'
    {
        "Bucket": "my-bucket",
        "UploadId": "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R",
        "Key": "multipart/01"
    }
    

    Upload part:

    aws s3api upload-part --bucket my-bucket --key 'multipart/01' --part-number 1 --body part01 --upload-id  "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R"
    

    Complete multi-part:

    aws s3api complete-multipart-upload --multipart-upload file://mpustruct --bucket my-bucket --key 'multipart/01' --upload-id dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R
    

    The mpstruct is:

    {
      "Parts": [
        {
          "ETag": "e868e0f4719e394144ef36531ee6824c",
          "PartNumber": 1
        },
        {
          "ETag": "6bb2b12753d66fe86da4998aa33fffb0",
          "PartNumber": 2
        },
        {
          "ETag": "d0a0112e841abec9c9ec83406f0159c8",
          "PartNumber": 3
        }
      ]
    }
    

    the structure can be obtained also from the list-multipart-uploads command.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search