skip to Main Content

I wanted to create a subdirectory in a directory in one of my Azure blob storage containers. I know that it is not possible doing it via UI. So I created a databricks notebook and executed the following command:

dbutils.fs.mkdir("mnt/<containername>/directory/subdirectory).

The command is executing I mean it is not throwing any error and it is creating up to the directory level. But when it comes to the subdirectory the code is not creating one. All the mount points are correct.

Our team used the same (back in 2021 I guess) code to create a subdirectory, then it worked now it is not. Can someone help me with this?

Thank you.

3

Answers


  1. You are probably mixing vanilla Blob Storage with Azure Data Lake Storage (ADLS). ADLS adds among other things true hierarchical namespace to blob storage, which allows you to, well, to use it like a real filesystem. With traditional blob storage, all folders and hierarchies are just blobs with slashes in the filenames. They wont work with tools that do not explicitly know how to work with a blob storage.

    When working with Databricks, I would suggest using ADLS instead of standard Blob Storage if you want to work with hierarchies.

    Login or Signup to reply.
  2. First, there are many utilities and/or commands that can be used to work with storage. This includes spark aware commands like dbutils.fs.*, plain old python libraries and all unix related commands using the shell magic. Non spark libraries require the "dbfs" prefix in various forms and can not accept URL based paths. See image below for samples.

    enter image description here

    Second, there is storage that is automatically attached to the cluster which I consider local. However, nothing is really local in the cloud!

    %fs ls /
    

    If you execute the above command, you will see the file system that part of the data pane. If you never seen it before, here is the diagram of both the control and data planes. What is missing from this diagram is that parts of DBFS can be mount points to remote storage. Also, if you use URLs, then you talk directly to the remote storage service.

    enter image description here

    Azure Blob Storage is the foundation for both BLOBs and Hierarchical containers. It is all the same foundation now a-days but I suggest you using ADLS Gen 2.0 for both the RBAC and ACL security layers.

    Please see the article below for details. Some people mount the storage when the cluster comes up. This can be done via additional cluster configuration commands. Others like passing the credentials via spark session variables with a notebook. I find this technique fragile. If you do not have access to the program that sets the spark configuration, you do not have access to the remote storage.

    https://docs.databricks.com/external-data/azure-storage.html

    My guess from your statement above is that the storage was mounted but no longer exists. How did I determine this? It is tradition to mount storage under /mnt.

    If you are using blob storage, you are probably using a Shared Access Signature. Those are time based and expire. More details need to be supplied to narrow down the issue.

    If you are using ADLS, you are probably using a service principle to mount storage. This means the principle needs both Storage Blob Contributor rights at the RBAC layer and "rwx" at the ACL layer. Use Azure Storage explorer to assign the rights.

    https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-model

    If you are curious, use the following command to see existing mount points.

    display(dbutils.fs.mounts())
    

    In summary, there are many ways to play with storage. More details are needed to narrow down you exact problem.

    Login or Signup to reply.
  3. I have reproduced the above and got same results when I used only Blob Storage account (Hierarchical namespace not enabled).

    enter image description here

    I have mounted and tried to create a folder structure sample4/sample5.

    enter image description here

    You can see it gave me True but in Blob storage, there is no sub folder created sample5. Only a block blob is created.

    enter image description here

    Only the Block blob will be created for every last sub directory irrespective of folder level.

    As said by @bursson in Blob Storage everything is a Block blob. If you check, if my intermediate directory is sample2a block blob also created along with folder in blob1.

    enter image description here

    When I tried this with Data Lake Storage Gen2, all folders and sub folders are creating with the same code.

    enter image description here

    Folders:

    enter image description here

    So, if you want to create sub directories in databricks, use ADLS and mount it with wasbs.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search