I am trying to connect to Azure Serverless SQL using python but getting the following error
External table 'dbo.my_table' is not accessible because location does not exist or it is used by another process.
I have created the External table using following commands.
First, I created a database scope credential
CREATE DATABASE SCOPED CREDENTIAL python_access_credential WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = '<SAS TOKEN>';
Then I created the external data source with the above database scope credential using following command
CREATE EXTERNAL DATA SOURCE python_data WITH ( LOCATION ='abfss://<StorageAccount>.dfs.core.windows.net', CREDENTIAL= [python_access_credential]);
Then I created the external table
CREATE EXTERNAL TABLE my_table ([C1] bigint, [C2] nvarchar(4000))WITH (LOCATION ='data.tsv', DATA_SOURCE = [python_data],FILE_FORMAT = [SynapseDelimitedTextFormat])
Then, I ran the following command to grant access to the user used in my python script
GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[python_access_credential] TO [user];
Also, same error occurs when I try to access the table via Synapse Studio
I had referred to many Azure documents, Microsoft Q&A pages and one other stack overflow question which I found, but couldn’t find a solution. Most answers pointed towards using DATABASE SCOPE CREDENTIALS and using it in CREDENTIAL param while creating an external data source, which I have already done.
2
Answers
I was able to solve the problem by using Managed Identity credentials instead of Shared Access Signature. Ran the following code to create the credential and external table
Create a master key first.
Now create a Managed Identity database scoped credential
Create a login user along with password. You will use it in your python script to run queries
The above login is to be used as UID in python script. Now create user
Now create an external data source with the above Credentials and use that data source to create an external table
Now, final step, grant Select access
I suggest you create a linked dataset and use the GUI to do all this work for you. Today, I am going to walk thru the process for the [sales_lt] database and [address] table that is stored in the data lake using the parquet format.
There are two types of workspace databases you can create. If you are not using spark SQL, select the first type.
Click the plus icon and add a SQL database called [stack_overflow] to the serverless pools. This type of pool is great since you only get charged when you query the data.
We can see from the above image that are new database, [stack_overflow] now exists.
If you are familiar with Azure Data Factory, configure a linked service that connects to your data storage and browse to the file that you want to convert into a table. Right click the "create external table" menu option.
Please see MSDN article for "how to create a linked service".
https://learn.microsoft.com/en-us/azure/data-factory/concepts-linked-services?tabs=data-factory
It will ask you what is the default text field size – 4000 bytes.
Next, choose to create and open a script. There is a bug in the GUI interface. While, it states you can create a new schema and new table with in the data entry field, it does not create the code for the schema.
Just add lines 13/14 and change 16 to prefix the table with the schema called [sales_lt]. Please note, this creates the code that you showed above w.o any real work on your part.
We need to add an AD user to the synapse workspace and give it rights to view the table. I am going to start with sysadmin rights. You can reduce the rights once you have it working.
Find the end point for the Synapse Server-less Pools. In short, the service looks like and acts like SQL server with the tabular data stream (TDS) as the format of the data.
We can use the latest driver from Microsoft and write code for python to connect to the database. We need to use Active Directory with password option. Synapse does not support non-active directory users.
Please see the MSDN documentation for how to download latest driver (native + odbc).
https://learn.microsoft.com/en-us/sql/connect/jdbc/connecting-using-azure-active-directory-authentication?view=sql-server-ver16
I am a DBA by trade, while I am crafty with other languages. I am going to test the connection using SQL Server Management Studio (SSMS).
Enter the fully qualified address to the Synapse Server-less Pool. Add the user, [email protected] and the correct password.
The advance tab allows you to make connection changes. Make sure TCP/IP is used as the network protocol and the default database is [stack_overflow].
Last but not least, the select statement retrieves data from the Synapse Server-less pool and returns it to the client, SSMS. In short, the solution works. Now, we just need to code the connection and data retrieval using python.
Here is a link on how to use the latest ODBC driver for SQL Server for Python.
https://datatofish.com/how-to-connect-python-to-sql-server-using-pyodbc/
Please remember, server-less pools provide read access to file in the data lake. These files look like tables in a database. No write options are supported with this service.