skip to Main Content

I’m using Azure Databricks notebook to read a excel file from a folder inside a mounted Azure blob storage.

The mounted excel location is like : "/mnt/2023-project/dashboard/ext/Marks.xlsx".2023-project is the mount point and dashboard is the name of the container.

When I do a dbutils.fs.ls I can see all the files inside the ext folder. There are lot of os functions being used in the code as it was developed on a different environment.

When I do a os.listdir on ext folder, I get a error No such file or directory. When I do a os.listdir on dashboard container I get mount.err as the output. While reading the excel file using pandas or openpyxl I get a error No such file or directory.

I’m using DBR 12.1 (includes Apache Spark 3.3.1, Scala 2.12). I mounted the azure storage using the credential pass through method.

configs = {
  'fs.azure.account.auth.type': 'CustomAccessToken',
  'fs.azure.account.custom.token.provider.class': spark.conf.get('spark.databricks.passthrough.adls.gen2.tokenProviderClassName')
}

Please help on this. I’m relatively new to Databricks.

2

Answers


  1. From the docs:

    When using commands that default to the driver volume, you must use
    /dbfs before the path.

    As this is the case for pandas, try with
    "/dbfs/mnt/2023-project/dashboard/ext/Marks.xlsx"

    There is a lot more information in the provided link as to how and why.

    Login or Signup to reply.
  2. The presence of the mount.err says about error accessing the mount. You can check content of this file using the following command in the notebook:

    %sh cat /dbfs/mnt/2023-project/mount.err
    

    This happens most probably because of the use custom token when doing mount that isn’t supported by DBFS fuse.

    As workaround I would suggest following – copy the file from the DBFS to the local disk and read it from there. This should work because dbfutils.fs commands should support that custom token. Do first:

    dbutils.fs.cp("/mnt/2023-project/dashboard/ext/Marks.xlsx", 
      "file:/tmp/Marks.xlsx")
    

    and then read the /tmp/Marks.xlsx file using Pandas or other package that uses local file API.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search