I have a s3 bucket "bucket1" inside which I have a directory named "dir1". Inside this directory there are multiple files. I just want to create a list of all the file names in this directory in my pyspark code which I am not able to. I am completely new to pyspark so any leads will be helpful. Do I need to create a spark session for it? Also I don’t want to use libraries like boto3 etc.
Question posted in Amazon Web Sevices
The official Amazon Web Services documentation can be found here.
The official Amazon Web Services documentation can be found here.
2
Answers
Without using boto3, you’ll need to open a session. Remember to have your AWS credentials configured. You can also use an IAM role in this case, if you’ll deploy this somewhere.
In term of performance it’s better to use the boto3.
But if you like to use pyspark, you can use the pyspark function input_file_name()
Example :