When I run the below code locally it works, but when I run it inside of Azure Databricks it hangs forever and never stops running. I know that the endpoint
and sasToken
is correct because it works locally. But it does not work when I run it directly from an Azure Databricks Notebook. Any ideas?
import com.azure.storage.blob.BlobClientBuilder
import java.io.InputStream
val input: InputStream = new BlobClientBuilder()
.endpoint(s"https://<storage-account>.blob.core.windows.net")
.sasToken("<sas-token>")
.containerName("<container-name>>")
.blobName("<blob-name>")
.buildClient()
.openInputStream()
2
Answers
I solved this by using shaded jars (https://maven.apache.org/plugins/maven-shade-plugin/) within my app. This example here helped me walk through setting that up. https://github.com/anuchandy/azure-sdk-in-data-bricks. See below for an updated example. Now I can prefix my
import
with the shaded group id that I created in my POM plugin config. My code in Databricks now knows exactly what dependency to use when reading from blob storage.Azure Blob Storage Dependency:
Maven Shade Plugin:
Make sure to check whether Secure transfer Enable or not. Go to azure storage account -> Settings go to the configuration you will find Secure transfer. If not, enable it. The secure transfer provides the security of your storage account by only allowing requests to the storage account by a secure connection.
There are different alternative options available: Reading Azure Blob Storage file directly from Azure Databricks Notebook.
Option 1: Accessing Azure Blob Storage from Azure Databricks by Gauri Mahajan
Option 2: Access Azure Blob storage using SAS token provided by Microsoft.