I’m using Azure Databricks notebook to read a excel file from a folder inside a mounted Azure blob storage.
The mounted excel location is like : "/mnt/2023-project/dashboard/ext/Marks.xlsx"
.2023-project is the mount point and dashboard is the name of the container.
When I do a dbutils.fs.ls
I can see all the files inside the ext folder. There are lot of os functions being used in the code as it was developed on a different environment.
When I do a os.listdir
on ext folder, I get a error No such file or directory
. When I do a os.listdir
on dashboard container I get mount.err
as the output. While reading the excel file using pandas
or openpyxl
I get a error No such file or directory
.
I’m using DBR 12.1 (includes Apache Spark 3.3.1, Scala 2.12). I mounted the azure storage using the credential pass through method.
configs = {
'fs.azure.account.auth.type': 'CustomAccessToken',
'fs.azure.account.custom.token.provider.class': spark.conf.get('spark.databricks.passthrough.adls.gen2.tokenProviderClassName')
}
Please help on this. I’m relatively new to Databricks.
2
Answers
From the docs:
As this is the case for pandas, try with
"/dbfs/mnt/2023-project/dashboard/ext/Marks.xlsx"
There is a lot more information in the provided link as to how and why.
The presence of the
mount.err
says about error accessing the mount. You can check content of this file using the following command in the notebook:This happens most probably because of the use custom token when doing mount that isn’t supported by DBFS fuse.
As workaround I would suggest following – copy the file from the DBFS to the local disk and read it from there. This should work because
dbfutils.fs
commands should support that custom token. Do first:and then read the
/tmp/Marks.xlsx
file using Pandas or other package that uses local file API.