I’ve seen many iterations of this question but cannot seem to understand/fix this behavior.
I am on Azure Databricks working on DBR 10.4 LTS Spark 3.2.1 Scala 2.12 trying to write a single csv file to blob storage so that it can be dropped to an SFTP server. Could not use spark-sftp because I am on Scala 2.12 unfortunately and could not get the library to work.
Given this is a small dataframe, I am converting it to pandas and then attempting to_csv.
to_export = df.toPandas()
to_export.to_csv(pathToFile, index = False)
I get the error: [Errno 2] No such file or directory: '/dbfs/mnt/adls/Sandbox/user/project_name/testfile.csv
Based on the information in other threads, I create the directory with dbutils.fs.mkdirs("/dbfs/mnt/adls/Sandbox/user/project_name/") /n Out[40]: True
The response is true and the directory exists, yet I still get the same error. I’m convinced it is something obvious and I’ve been staring at it for too long to notice. Does anyone see what my error may be?
2
Answers
Python’s
pandas
library recognizes the path only when it is in File API Format (since you are using mount). Anddbutils.fs.mkdirs
uses Spark API Format which is different from File API Format.As you are creating the directory using dbutils.fs.mkdirs with path as
/dbfs/mnt/adls/Sandbox/user/project_name/
, this path would be actually considered asdbfs:/dbfs/mnt/adls/Sandbox/user/project_name/
. Hence, the directory would be created within DBFS.Are you working in a repo? Because if you are,
.to_csv()
will try to save in the working directory of your repo and will not be able to access dbfs.to export your spark df as csv to dbfs try:
your csv file will be at
dbfs:/path/to/file.csv/part-00000-tid-XXXXXXXX.csv