skip to Main Content

I would like to move files from Backblaze B2 to Amazon S3. The instructions here say that I should download them to a local directory. However, I am trying to transfer about 180 TB of data so I would prefer to not have to download them locally.

I found this post with a similar question, but I was wondering if there was a way to do this using the command line instead of ForkLift.

Thank you

2

Answers


  1. I’m assuming you mean AWS S3 when you say "to AWS"?

    I don’t think there’s a way to do that without some sort of intermediate service storing it locally. I would run the transfer from an EC2 server instead of from your local laptop, so that the data gets transferred from Backblaze directly into the AWS data center. Then it would just be a matter of copying the files from the EC2 instance to the S3 bucket, all within the same data center.

    Login or Signup to reply.
  2. Yes, you can do this using the AWS CLI. The aws s3 cp command can read stdin or write to stdout by using - instead of a filename, so you can pipe two aws s3 cp commands together to read a file from Backblaze B2 and write it to Amazon S3 without it hitting the local disk.

    First, configure two AWS profiles from the command line – one for B2 and the other for AWS. aws configure will prompt you for the credentials for each account:

    % aws configure --profile b2
    % aws configure --profile aws
    

    Now you can specify the profiles in the two aws s3 cp commands. Note that the first aws s3 cp command also needs the --endpoint-url argument, since it can’t be set in the profile:

    aws --profile b2 --endpoint-url 'https://<Your Backblaze bucket endpoint>' 
        s3 cp s3://<Your Backblaze bucket name>/filename.ext - 
    | aws --profile aws s3 cp - s3://<Your AWS bucket name>/filename.ext
    

    It’s easy to run a quick test on a single file

    # Write a file to Backblaze B2
    % echo 'Hello world!' | 
    aws --profile b2 --endpoint-url 'https://s3.us-west-004.backblazeb2.com' 
        s3 cp - s3://metadaddy-b2/hello.txt
    
    # Copy file from Backblaze B2 to Amazon S3
    % aws --profile b2 --endpoint-url 'https://s3.us-west-004.backblazeb2.com' 
        s3 cp s3://metadaddy-b2/hello.txt - 
    | aws --profile aws s3 cp - s3://metadaddy-s3/hello.txt
    
    # Read the file from Amazon S3
    % aws --profile aws s3 cp s3://metadaddy-s3/hello.txt -
    Hello world!
    

    One wrinkle is that, if the file is more than 50 GB, you will need to use the --expected-size argument to specify the file size so that the cp command can split the stream into parts for a large file upload. From the AWS CLI docs:

    --expected-size (string) This argument specifies the expected size of a stream in terms of bytes. Note that this argument is needed only when a stream is being uploaded to s3 and the size is larger than 50GB. Failure to include this argument under these conditions may result in a failed upload due to too many parts in upload.

    Here’s a one-liner that copies the contents of a bucket on B2 to a bucket on S3, outputting the filename (object key) and size of each file. It assumes you’ve set up the profiles as above.

    aws --profile b2 --endpoint-url 'https://s3.us-west-004.backblazeb2.com' 
        s3api list-objects-v2 --bucket metadaddy-b2 
    | jq '.Contents[] | .Key, .Size' 
    | xargs -n2 sh -c 'echo "Copying "$1" ($2 bytes)"; 
        aws --profile b2 --endpoint-url "https://s3.us-west-004.backblazeb2.com" 
            s3 cp "s3://metadaddy-b2/$1" - 
        | aws s3 --profile aws cp - "s3://metadaddy-s3/$1" --expected-size $2' sh
    

    Although this technique does not hit the local disk, the data still has to flow from B2 to wherever this script is running, then to S3. As @Mark B mentioned in his answer, run the script on an EC2 instance for best performance.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search