I have a Az Synapse (dedicated SQL pool) configured with managed VNet in tenant A and storage account in tenant B. The storage account is firewall protected and only certain VNets and IPs can access it. I want to created external tables from the Az synapse and hence, access the Storage account residing in the other tenant.
I have created a private endpoint on the storage account using Az synapse and the necessary IAM roles are in place.
The external table is created and I can retrieve the data when the firewall on storage account is lifted.
However, when the storage account firewall is enabled, I get the following error:
Msg 105019, Level 16, State 1, Line 1
External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://someadlsl001.dfs.core.windows.net/somecontainer/?upn=false&action=getAccessControl&timeout=90'
The SQL queries used in synapse workspace SQL script is
CREATE DATABASE SCOPED CREDENTIAL cred WITH IDENTITY = '{clientID of service principal}@https://login.microsoftonline.com/{tenantID}/oauth2/token', SECRET = 'xxxxxxxxxxxxxxxx'
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH ( LOCATION = 'abfss://[email protected]/weather.csv' , CREDENTIAL = cred, TYPE = HADOOP ) ;
CREATE EXTERNAL TABLE [dbo].[WeatherData2] (
[usaf] [nvarchar](100) NULL
)
WITH
(
LOCATION='/',
DATA_SOURCE = AzureDataLakeStore,
FILE_FORMAT = csvFile,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
select * from [dbo].[WeatherData2]
Please help
2
Answers
You can retrieve the data when the firewall on storage account is disabled. It shows There is an issue with Role assignment.
You need to make sure User is assigned with Storage Blob Data Contributor role to the service principal.
Also make sure you whitelist IP address.
Reference – https://learn.microsoft.com/en-us/answers/questions/648148/spark-pool-notebook-error.html
*Configure Azure Storage firewalls and virtual networks
You mention your firewall is configured to allow only certain VNets & IPs. You might need to elaborate for us on what your rules are specifically, but the documentation is very clear on how this is configured when accessing the storage account from another tenant.
AZ CLI:
Az Powershell:
And this might be stating the obvious, but any IP range in the IP address whitelist only applies to the public endpoints of the storage account. Keep that in mind if you’re trying to whitelist resources accross tenants or on-premise.