Use SQL query to extract data from an external SQL Server database in Azure databricks

lem
July 4, 2024
154 views
1 vote
2 Answers

I am connecting to an on prem SQL Server database using the code below and it works as expected. I have wrapped the query I want to use in a view and I called the view to read the data into the dataframe. How do I explicitly specify a select query in the spark.read options instead of using a full table or wrapping my select in a view? e.g. option("sqlcommand", "select col1, col2 from table1;")

df = (spark.read 
        .format("jdbc") 
        .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
        .option("url", "jdbc:sqlserver://our connection string")     
        .option("dbtable", "dbo.databrickstest") 
        .option("user",user) 
        .option("password", pwd) 
        .option("encrypt", "true") 
        .option("trustServerCertificate", "true") 
        .load()
)

Answers

- Bhavani
- July 3, 2024 at 3:14 pm
- 0 votes
0
You should use the query option instead of the dbtable option. Use .option("query", "SELECT col1, col2 FROM table1") to specify your SQL query. Here is the sample code for your reference:
```
jdbcHostname = "<serverName>"     
jdbcPort = 1433    
jdbcDatabase = "<dbName>"     
jdbcUsername = "<userName>"             
jdbcPassword = "<password>"            
jdbcDriver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"     
 

jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};databaseName={jdbcDatabase}"
df = spark.read.format("jdbc").option("driver", jdbcDriver).option("url", jdbcUrl).option("query", "select EmployeeID, FirstName from Employees").option("user", jdbcUsername).option("password", jdbcPassword).load()
df.show()
```
It will run the query successfully and give below output:

EmployeeID FirstName

1 John

2 Jane

3 Mike

4 Emily
Login or Signup to reply.

- FredAlisson
- July 4, 2024 at 4:09 am
- 0 votes
0
You are reading the entire table when you use:
```
.option("dbtable", "dbo.databrickstest")
```
To avoid this you will need to replace this line with:
```
.option("query", query_you_want)
```
Complete example:
```
query = ("""
     SELECT
        columns
     FROM table
     WHERE condition
    """)
df = spark.read 
.format("jdbc") 
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
.option("url", "jdbc:sqlserver://our connection string") 
.option("query", query) 
.option("user",user) 
.option("password", pwd) 
.option("encrypt", "true") 
.option("trustServerCertificate", "true") 
.load()
```
reference from doc: https://spark.apache.org/docs/3.5.1/sql-data-sources-jdbc.html#data-source-option

Thanks for sharing your doubt and let me know if you need more help!
Login or Signup to reply.

EmployeeID	FirstName
1	John
2	Jane
3	Mike
4	Emily

Please signup or login to give your own answer.

Click here to cancel reply.