I am connecting to an on prem SQL Server database using the code below and it works as expected. I have wrapped the query I want to use in a view and I called the view to read the data into the dataframe. How do I explicitly specify a select query in the spark.read
options instead of using a full table or wrapping my select in a view? e.g. option("sqlcommand", "select col1, col2 from table1;")
df = (spark.read
.format("jdbc")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.option("url", "jdbc:sqlserver://our connection string")
.option("dbtable", "dbo.databrickstest")
.option("user",user)
.option("password", pwd)
.option("encrypt", "true")
.option("trustServerCertificate", "true")
.load()
)
2
Answers
You should use the
query
option instead of thedbtable
option. Use.option("query", "SELECT col1, col2 FROM table1")
to specify your SQL query. Here is the sample code for your reference:It will run the query successfully and give below output:
You are reading the entire table when you use:
To avoid this you will need to replace this line with:
Complete example:
reference from doc: https://spark.apache.org/docs/3.5.1/sql-data-sources-jdbc.html#data-source-option
Thanks for sharing your doubt and let me know if you need more help!