skip to Main Content

I got a SAS token which was created for a specific folder on my Azure Datalake Gen2. The goal is, to download the folder with all its contents.

I understand, that I can create a DataLakeServiceClient, a FileSystemClient, or a DataLakeDirectoryClient as follows:

# configuration
url = 'https://my-account.blob.core.windows.net'
sas_token = '<sas-token>'
file_system_name =  'file_system_1'
subfolder_path = 'subfolder_1'

# service client
data_lake_service_client = DataLakeServiceClient(account_url=url, credential=sas_token)

# directory client and file system client
file_system_client = data_lake_service_client.get_file_system_client(file_system=file_system_name)
data_lake_directory_client = data_lake_service_client.get_directory_client(file_system=file_system_name, directory=subfolder_path)

Now to download specific files, I need to know what files exist:

  • Unfortunately, the DataLakeDirectoryClient does not have a function to get all paths of the files inside that directory.

  • On the other hand, the FileSystemClient has that function but is searching on the file system level, where my SAS token does not have access.

How do I list and download all files in my directory?

2

Answers


  1. Now to download specific files, I need to know what files exist

    You can list files in a specific directory using the list_paths() method of FileSystemClient and then filter them by the directory’s prefix.

    How do I list and download all files in my directory?

    Here’s a sample code for listing and downloading all files in a directory:

    # List all files in the directory
    paths = file_system_client.list_paths(name_starts_with=subfolder_path)
    file_paths = [path.name for path in paths if not path.is_directory]
    
    # Download the files
    import os
    destination_folder = "path_to_save_files"
    
    for file_path in file_paths:
        file_client = file_system_client.get_file_client(file_path)
        dest_path = os.path.join(destination_folder, os.path.basename(file_path))
        
        with open(dest_path, "wb") as f:
            data_stream = file_client.download_file()
            f.write(data_stream.readall())
    
    Login or Signup to reply.
  2. I have reproduced in my environment and got expected results as below:

    In Portal created SAS for subfolder2 and below are files in that folder:

    enter image description here

    Code which worked for me :

    from azure.storage.filedatalake import DataLakeServiceClient
    import os
    
    url1 = "https://accountname.dfs.core.windows.net"
    sas = "sp=racwdlmeop&st=2023-08-30T10:46:22Z&se=2023-08-30T18:6:2Z&spr=https&sv=2022-11-02&sr=d&sig=kCkP25wyg%2A5"
    subfolderpath="folder2/subfolder2"
    data_lake_service_client = DataLakeServiceClient(account_url=url1, credential=sas)
    containername="rithwik"
    
    fsc = data_lake_service_client.get_file_system_client(file_system=containername)
    
    portalfolder = [path.name for path in fsc.get_paths(path=subfolderpath)]
    
    for fp in portalfolder:
        fct = fsc.get_file_client(file_path=fp)
        retrievedfile = fct.download_file()
        print(retrievedfile)
        localSaving = os.path.join(r'C:UsersDesktopNew folder', fp)
        os.makedirs(os.path.dirname(localSaving), exist_ok=True)
        with open(localSaving, "wb") as f:
            f.write(retrievedfile.readall())
    

    Output:

    enter image description here

    Downloaded Locally:

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search