I am trying to access Azure Data Lake Storage Gen2 with a Service Principal via Unity Catalog.
- Managed Identity is added with Contributor Role assigned to the storage account
- Managed Identity is added as a Storage Credential
- the storage container is added as an external location with this credential
- the Service Principal is added with
All Privileges
on the external location
In PySpark I set the Spark config according to the Azure Gen 2 documentation:
from pyspark.sql.types import StringType
spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")
# create and write dataframe
df = spark.createDataFrame(["10","11","13"], StringType()).toDF("values")
df.write
.format("delta")
.mode("overwrite")
.save(f"abfss://{container}@{storage_account}.dfs.core.windows.net/example/example-0")
This unfortunaly this returns an unexpected:
Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://{storage-account}.dfs.core.windows.net/{container-name}/example/example-0?upn=false&action=getStatus&timeout=90
2
Answers
The above error mainly happens because no proper access to service principle with Storage account.
I tried to reproduce the same in my environment I got the same error.
To resolve the above error. Please follow this approach:
First go to your Azure Storage Account -> Containers -> Manage ACL
Inside Manage ACL Add Service principle and Access permissions as shown in the image.
Now, you can check the Azure Databricks connected to Azure Data Lake Gen2.
Write Dataframe from Azure Databricks to Gen2
When you use Unity Catalog you don’t need these properties – they were needed prior to Unity Catalog and not used right now, or used only for clusters without UC for direct data access:
The authentication to the given storage location will happen via mapping the storage credential to the external location path.
But permissions will be checked for the user/service principal who is running a giving piece of code, so this user/principal should have corresponding permission on the external location. If you run this code as SP-assigned job, then it will have access. But if you run it as yourself, it won’t work until you get permissions.