skip to Main Content

I am having a bit of a trouble with Synapse notebooks. I want to get a list of blob via pyspark script to dynamically decide which files I want to integrate.
I cannot make this thing work in Synapse.. in other environment such as Jupyter notebook the code is working as expected.

from azure.storage.blob import ContainerClient, BlobServiceClient,AccountSasPermissions, ResourceTypes
from azure.storage.blob._shared_access_signature import SharedAccessSignature,BlobSharedAccessSignature

sas_token = ‘hardcoded_value’

account_url1 = ‘https://storage_account.blob.core.windows.net/container‘ + sas_token

print(account_url1)
container_client = ContainerClient.from_container_url(container_url=account_url1)
source_blob_list = container_client.list_blobs()
for blob in source_blob_list:
print (blob.name + ‘n’)

The output from the code above in Synapse is:

ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7f282242e130>: Failed to establish a new connection: [Errno -2] Name or service not known

The output from the code above in Jupyter notebook is as expected..

snip

I have Storage Blob Data Contributor assigned to my user and to Synapse user as well.

2

Answers


  1. Chosen as BEST ANSWER

    In the end was permissions to the managed identity of Synapse... The code above was working as I stated outside of Synapse. Now when We added permissions to the managed private endpoint of Synapse everything is working. Thank you!


  2. The above error mainly happens because of invalid syntax of URL.

    Please follow below syntax. You will get list of blob files:

    mssparkutils.fs.ls('wasbs://<container_name>@<Storage_account_name>.blob.core.windows.net/')
    

    Ref1

    For more information refer this MS document

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search