I would like to move files from Backblaze B2 to Amazon S3. The instructions here say that I should download them to a local directory. However, I am trying to transfer about 180 TB of data so I would prefer to not have to download them locally.
I found this post with a similar question, but I was wondering if there was a way to do this using the command line instead of ForkLift.
Thank you
2
Answers
I’m assuming you mean AWS S3 when you say "to AWS"?
I don’t think there’s a way to do that without some sort of intermediate service storing it locally. I would run the transfer from an EC2 server instead of from your local laptop, so that the data gets transferred from Backblaze directly into the AWS data center. Then it would just be a matter of copying the files from the EC2 instance to the S3 bucket, all within the same data center.
Yes, you can do this using the AWS CLI. The
aws s3 cp
command can readstdin
or write tostdout
by using-
instead of a filename, so you can pipe twoaws s3 cp
commands together to read a file from Backblaze B2 and write it to Amazon S3 without it hitting the local disk.First, configure two AWS profiles from the command line – one for B2 and the other for AWS.
aws configure
will prompt you for the credentials for each account:Now you can specify the profiles in the two
aws s3 cp
commands. Note that the firstaws s3 cp
command also needs the--endpoint-url
argument, since it can’t be set in the profile:It’s easy to run a quick test on a single file
One wrinkle is that, if the file is more than 50 GB, you will need to use the
--expected-size
argument to specify the file size so that thecp
command can split the stream into parts for a large file upload. From the AWS CLI docs:Here’s a one-liner that copies the contents of a bucket on B2 to a bucket on S3, outputting the filename (object key) and size of each file. It assumes you’ve set up the profiles as above.
Although this technique does not hit the local disk, the data still has to flow from B2 to wherever this script is running, then to S3. As @Mark B mentioned in his answer, run the script on an EC2 instance for best performance.