skip to Main Content

I am trying to access Azure Data Lake Storage Gen2 with a Service Principal via Unity Catalog.

  • Managed Identity is added with Contributor Role assigned to the storage account
  • Managed Identity is added as a Storage Credential
  • the storage container is added as an external location with this credential
  • the Service Principal is added with All Privileges on the external location

In PySpark I set the Spark config according to the Azure Gen 2 documentation:

from pyspark.sql.types import StringType

spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

# create and write dataframe
df = spark.createDataFrame(["10","11","13"], StringType()).toDF("values")
df.write 
  .format("delta") 
  .mode("overwrite") 
  .save(f"abfss://{container}@{storage_account}.dfs.core.windows.net/example/example-0")

This unfortunaly this returns an unexpected:

Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://{storage-account}.dfs.core.windows.net/{container-name}/example/example-0?upn=false&action=getStatus&timeout=90

2

Answers


  1. Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD,https://{storage account}.dfs.core.windows.net/{container-name}/example/example-0?upn=false.

    The above error mainly happens because no proper access to service principle with Storage account.

    I tried to reproduce the same in my environment I got the same error.

    enter image description here

    To resolve the above error. Please follow this approach:

    First go to your Azure Storage Account -> Containers -> Manage ACL

    enter image description here

    Inside Manage ACL Add Service principle and Access permissions as shown in the image.

    enter image description here

    Now, you can check the Azure Databricks connected to Azure Data Lake Gen2.

    spark.conf.set("fs.azure.account.auth.type.<Storage_account>.dfs.core.windows.net", "OAuth")
    spark.conf.set("fs.azure.account.oauth.provider.type.<storage_account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    spark.conf.set("fs.azure.account.oauth2.client.id.<storage_account>.dfs.core.windows.net", "<client_id>")
    spark.conf.set("fs.azure.account.oauth2.client.secret.<storage_account>.dfs.core.windows.net", "<client_secret>")
    spark.conf.set("fs.azure.account.oauth2.client.endpoint.vamblob.dfs.core.windows.net", "https://login.microsoftonline.com/<tenant_id>/oauth2/token")
    
       
    from pyspark.sql.types import StringType
    df = spark.createDataFrame(["10","11","13"], StringType()).toDF("values")
    
    display(df)
    

    enter image description here

    Write Dataframe from Azure Databricks to Gen2

    df.write 
    .format("delta") 
    .mode("overwrite") 
    .save(f"abfss://<container>@<<storage_account>.dfs.core.windows.net/example/example-0")
    

    enter image description here

    Login or Signup to reply.
  2. When you use Unity Catalog you don’t need these properties – they were needed prior to Unity Catalog and not used right now, or used only for clusters without UC for direct data access:

    spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
    spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", client_id)
    spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
    spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")
    

    The authentication to the given storage location will happen via mapping the storage credential to the external location path.

    But permissions will be checked for the user/service principal who is running a giving piece of code, so this user/principal should have corresponding permission on the external location. If you run this code as SP-assigned job, then it will have access. But if you run it as yourself, it won’t work until you get permissions.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search