skip to Main Content

I’m not sure why, but calling this DataLakeDirectoryClient.CreateSubDirectoryAsync function creates a sub directory and a zero byte file of the same name. I only want it to create the sub directory. Note: I can see the zero byte file in Azure portal’s Storage Browser, but not in Azure Storage Explorer.

myDirectoryClient is of type Azure.Storage.Files.DataLake.DataLakeDirectoryClient

var mySubDirectoryClient = await myDirectoryClient.CreateSubDirectoryAsync("my_sub_dir");

Now I have the zero byte file, but no directory.
Next, I upload a file to the sub directory. fileName, mylocalFilePath are set elsewhere with valid values. The new file creation and upload works fine.

var newFileClient = mySubDirectoryClient.GetFileClient(fileName);
await using var uploadFs = File.OpenRead(mylocalFilePath);
var response = newFileClient.UploadAsync(uploadFs).Result;

Now I have the zero byte file, and the sub directory, and the file in the sub directory. Files/folders in picture have a different name (not "my_sub_dir"), but they were created the same way.
enter image description here

Is there a reason I have the extra zero byte file? Can I prevent this? Or do I just need to delete it afterwards? Or would deleting it be an issue?

I somewhat understand why the empty file is created, which I believe is that it doesn’t treat it like a directory until it contains a file. Kind of like how when you delete all of the files in a folder the folder disappears. I’d like to create the direct

Edit: Above code is snippets… I am uploading the entire function below, for clarity. This is a recursive function meant to copy all the files and subdirectories from a directory. It downloads a file locally, then uploads it to another location on a datalake, but as far as my question is concerned, the only code that should matter is in the if(path.IsDirectory??false)

async Task CopyDirectory(DataLakeDirectoryClient sourceDirectoryClient, DataLakeDirectoryClient targetDirectoryClient)
{
  var pathPages = sourceDirectoryClient.GetPathsAsync();
  var tasks = new List<Task>();
  await foreach (var page in pathPages.AsPages())
  {
    foreach (var path in page.Values)
    {
      var fileName = path.Name.Split("/").Last();
      if (path.IsDirectory??false)
      {
        var sourceSubDirectoryClient = sourceDirectoryClient.GetSubDirectoryClient(fileName);
        var targetSubDirectoryClient = await targetDirectoryClient.CreateSubDirectoryAsync(fileName);
        await CopyDirectory(sourceSubDirectoryClient, targetSubDirectoryClient, targetFileSystemClient);
        //this only returns one path that is a directory per directory, not one zero byte file and one directory                        
        //var x = targetDirectoryClient.GetPathsAsync();
        //await foreach (var y in x.AsPages())
        //{
        //    foreach (var z in y.Values)
        //    {
        //        Console.WriteLine(z);
        //    }
        //}
      }
      else if (true) //fileName.Contains("2023")) //filter here
      {
        var downloadPath = localTempPath + fileName;
        var sourceFileClient = sourceDirectoryClient.GetFileClient(fileName);
        var properties = await sourceFileClient.GetPropertiesAsync();
                        tasks.Add(Task.Run(async () => await FileDownloader.DownloadFileAsync(sourceFileClient, downloadPath))
                            .ContinueWith((result) => FileUploader.UploadFileAsync(targetDirectoryClient, properties, downloadPath))
                            .ContinueWith(async (result) => await DeleteFilesAsync(downloadPath)) // this cleans up the local file
                        );
      }
    }
  }
  Task.WaitAll(tasks.ToArray());
}

2

Answers


  1. Chosen as BEST ANSWER

    This is more of an explanation of "why it didn't work", rather than an answer.
    The root cause is that the storage account I was writing to was not Data Lake Storage. It was a plain old Blob Storage. It was not my account; I was just trying to move some files from my ADLS Gen 2 to their storage account. The code I had worked when I tried to create sub directories on my own ADLS Gen 2 account. Hopefully it can help someone else.


  2. I have reproduced in my environment and below are expected results:

    Before creating main and sub folder:

    enter image description here

    To create Main Folder and Sub Folder with same name you can follow below code:

    using Azure.Storage.Files.DataLake;
    
    
    string constring = "DefaultEndpointsProtocol=https;AccountName=rithwikst;AccountKey=laUfWt+AStx9qP7Q==;EndpointSuffix=core.windows.net";
    string conname = "rithwik";
    string folderPath = "/myfolder/myfolder";
    
    
    var rithclient = new DataLakeFileSystemClient(constring, conname);
    rithclient.CreateDirectory(folderPath);
    Console.WriteLine("Folder and subfolder created Rithwik Bojja.");
    

    Output:

    enter image description here

    enter image description here

    If you want to create a Sub Folder in Existing Folder with same name follow below code:

    using Azure.Storage.Files.DataLake;
    
    
    string constring = "DefaultEndpointsProtocol=https;AccountName=rithwikst;AccountKey=laUfW9qP7Q==;EndpointSuffix=core.windows.net";
    string conname = "rithwik";
    string folderPath = "/myfold";
    string subFolderPath = $"{folderPath}/myfold"; 
    
    var rithclient = new DataLakeFileSystemClient(constring, conname);
    rithclient.CreateDirectory(subFolderPath);
    Console.WriteLine("Subfolder created Rithwik.");
    

    Output:

    enter image description here

    enter image description here

    This is how I create folders and subfolders in Storage account without getting extra files with 0 bytes.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search