Pyspark unable to overwrite csv in S3 - Amazon web services

tarun
September 8, 2022
256 views
0 votes
2 Answers

I am facing issue when i try to write file in S3 as CSV.
I am basically trying to overwrite existing single csv file in an S3 folder. Below is the peice of code in I’m running.

I am getting below error. My wild guess is this is due to single file present in S3 folder. While overwriting it first deletes existing file which further deletes the S3 folder since there is no file inside it. And then it couldn’t create file since no folder exists with given name. Hence whole overwriting fails.

Any help to resolve this issue will be appreciated.

Answers

Chosen as BEST ANSWER
- tarun
- September 13, 2022 at 11:14 am
- 0 votes
0
So this issue didn't resolve, had to do work around. Seems like this issue is not with S3, the issue is of spark. Once you read a csv using Spark, you cannot write over the same csv until you read some other csv.

Work around looked like below:
1. Read from root/myfolder
2. Make your data transformations
3. Write transform the data into root/mytempfolder
4. Read from root/mytempfolder
5. Write into root/myfolder

(Edit)

- PatrickJuanMorais
- March 30, 2023 at 3:32 pm
- 0 votes
0
Caching the dataset solves the problem and you don’t need to save the same data into multiple paths

dataframe.cache()

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Pyspark unable to overwrite csv in S3 – Amazon web services

Answers