I am having a bit of a trouble with Synapse notebooks. I want to get a list of blob via pyspark script to dynamically decide which files I want to integrate.
I cannot make this thing work in Synapse.. in other environment such as Jupyter notebook the code is working as expected.
from azure.storage.blob import ContainerClient, BlobServiceClient,AccountSasPermissions, ResourceTypes
from azure.storage.blob._shared_access_signature import SharedAccessSignature,BlobSharedAccessSignature
sas_token = ‘hardcoded_value’
account_url1 = ‘https://storage_account.blob.core.windows.net/container‘ + sas_token
print(account_url1)
container_client = ContainerClient.from_container_url(container_url=account_url1)
source_blob_list = container_client.list_blobs()
for blob in source_blob_list:
print (blob.name + ‘n’)
The output from the code above in Synapse is:
ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7f282242e130>: Failed to establish a new connection: [Errno -2] Name or service not known
The output from the code above in Jupyter notebook is as expected..
I have Storage Blob Data Contributor assigned to my user and to Synapse user as well.
2
Answers
In the end was permissions to the managed identity of Synapse... The code above was working as I stated outside of Synapse. Now when We added permissions to the managed private endpoint of Synapse everything is working. Thank you!
The above error mainly happens because of invalid syntax of URL.
Please follow below syntax. You will get list of blob files:
For more information refer this MS document