Azure - to_csv "No Such File or Directory" But the directory does exist - Databricks on ADLS

Cole1998
December 14, 2022
223 views
0 votes
2 Answers

I’ve seen many iterations of this question but cannot seem to understand/fix this behavior.

I am on Azure Databricks working on DBR 10.4 LTS Spark 3.2.1 Scala 2.12 trying to write a single csv file to blob storage so that it can be dropped to an SFTP server. Could not use spark-sftp because I am on Scala 2.12 unfortunately and could not get the library to work.

Given this is a small dataframe, I am converting it to pandas and then attempting to_csv.

to_export = df.toPandas()

to_export.to_csv(pathToFile, index = False)

I get the error: [Errno 2] No such file or directory: '/dbfs/mnt/adls/Sandbox/user/project_name/testfile.csv

Based on the information in other threads, I create the directory with dbutils.fs.mkdirs("/dbfs/mnt/adls/Sandbox/user/project_name/") /n Out[40]: True

The response is true and the directory exists, yet I still get the same error. I’m convinced it is something obvious and I’ve been staring at it for too long to notice. Does anyone see what my error may be?

Answers

- SaideepArikontham
- December 15, 2022 at 9:18 am
- 0 votes
0
- Python’s pandas library recognizes the path only when it is in File API Format (since you are using mount). And dbutils.fs.mkdirs uses Spark API Format which is different from File API Format.
- As you are creating the directory using dbutils.fs.mkdirs with path as /dbfs/mnt/adls/Sandbox/user/project_name/, this path would be actually considered as dbfs:/dbfs/mnt/adls/Sandbox/user/project_name/. Hence, the directory would be created within DBFS.
```
dbutils.fs.mkdirs('/dbfs/mnt/repro/Sandbox/user/project_name/')
```
- So, you have to create the directory by modify the code to create directory to the following code:
```
dbutils.fs.mkdirs('/mnt/repro/Sandbox/user/project_name/')
#OR
#dbutils.fs.mkdirs('dbfs:/mnt/repro/Sandbox/user/project_name/')
```
- Writing to the folder would now work without any issue.
```
pdf.to_csv('/dbfs/mnt/repro/Sandbox/user/project_name/testfile.csv', index=False)
```
Login or Signup to reply.

- pickle
- April 5, 2023 at 11:31 pm
- 0 votes
0
Are you working in a repo? Because if you are, .to_csv() will try to save in the working directory of your repo and will not be able to access dbfs.

to export your spark df as csv to dbfs try:
```
sparkdf.coalesce(1) 
       .write.format("com.databricks.spark.csv") 
       .option("header", "true") 
       .save("dbfs:/path/to/file.csv")
```
your csv file will be at dbfs:/path/to/file.csv/part-00000-tid-XXXXXXXX.csv
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Azure – to_csv "No Such File or Directory" But the directory does exist – Databricks on ADLS

Answers