Mysql - Pyspark in Databricks: How to ingest data from odbc sql server to create a dataframe?

Emma
December 13, 2023
303 views
0 votes
2 Answers

in Azure databricks, need to connect to a table in odbc sql server and create a dataframe and create exactly the same table in databricks
I have already completed the below codes and I can see as attached. How to get the structured table, create the dataframe with the same info?
Could you please help with the scripts?

Below are what i have now:

pip install pyodbc
import pyodbc
server = 'XXXXXX-prod-db.database.chinacloudapi.cn'
database = 'database-ods'
username = 'os_reader'
password = 'P@ssword'
table = 'table_record'

cnxn = pyodbc.connect(  
f'DRIVER={{ODBC Driver 17 for SQL Server}};'  
f'SERVER={server};'  
f'DATABASE={database};'  
f'UID={username};'  
f'PWD={password};'  
f'SELECT INTO={table}'  
)  
cursor = cnxn.cursor()
query = "SELECT * FROM " + table  
cursor.execute(query)
rows = cursor.fetchall()
print(rows)

Answers

- JayashankarGS
- December 13, 2023 at 8:08 am
- 0 votes
0
You can follow this documentation for getting data.

In this, there are 2 options to get data:
1. Using sqlserver.
2. And using JDBC driver.
However, if you want to use ODBC itself, as @Anupam Chand mentioned, you will not get the column names.

But, you can achieve this using pandas:
```
p_df = pd.read_sql_query("SELECT * FROM student", cnxn)

s_df = spark.createDataFrame(p_df)
s_df.display()
```
Here, get data into a pandas dataframe and convert it into a spark dataframe.

pd.read_sql_query("<query>", <connection_object>)

Output:
Login or Signup to reply.

You’ve tagged MySQL and title says SQL Server. If it’s MySQL or SQL Server, they both support JDBC.

Spark supports JDBC as a data source.

TL;DR

jdbcDF = spark.read 
    .format("jdbc") 
    .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
    .option("url", "jdbc:sqlserver://XXXXXX-prod-db.database.chinacloudapi.cn") 
    .option("query", "SELECT * FROM table_record") 
    # .option("table", "table_record")  # Same as select * from table
    .option("user", "os_reader") 
    .option("password", "P@ssword") 
    .load()

jdbcDF.show()

While running it you need ensure driver is installed on your cluster somehow. E.g.

spark-submit --packages com.microsoft.sqlserver:mssql-jdbc:12.4.2.jre11 <your-script.py>

Please signup or login to give your own answer.

Click here to cancel reply.

Mysql – Pyspark in Databricks: How to ingest data from odbc sql server to create a dataframe?

Answers