I am trying to use s3 link provided to me https://ml-cloud-dataset.s3.amazonaws.com/Airlines_data.txt in putty terminal. So that I can create table in hive and load the dataset into it.
I tried to download data set using code:
aws s3 cp https://ml-cloud-dataset.s3.amazonaws.com/Airlines_data.txt /home/hadoop .
This code gave me error and I tried using multiple ways still failed to get the data.
2
Answers
If you use aws s3 cp, you need to have aws cli installed. If you have it installed, you can upload file using
To download the file, you can use
Depending on your aws setup, you need to either access keys or use sso for login. If the machine is in EC2, you can also IAM roles which will let you login without sso or access keys.
The URL
https://ml-cloud-dataset.s3.amazonaws.com/Airlines_data.txt
is saying:ml-cloud-dataset
Airlines_data.txt
Fortunately, it is a publicly accessible bucket, so you can list the contents with the AWS CLI:
You can copy the object to your own bucket using:
To copy ALL the objects, use:
However, if you are using Hive within AWS you possibly don’t even need to download the files — you could just reference it directly using
s3://ml-cloud-dataset/Airlines_data.txt
.You could also access it from Amazon Athena using that same path.