I am trying to connect to an external private MySQL Database from Azure Databricks.
I have tried with sqlalchemy and pymysql:
pymysql:
conn = pymysql.connect(
host='.....',
user='.....',
password = "....",
db='....')
cur = conn.cursor()
sqlalchemy:
engine = create_engine("mysql+pymysql://{0}:{1}@hostname:3306/database".format('user','password'))
connection = engine.connect()
The error code was the same:
OperationalError: (pymysql.err.OperationalError) (1045, "Access denied for user ‘user’@’IP_address’ (using password: YES)"
[The external database was configured for remote access and the IPs we were trying to access it with were whitelisted]
With mysql.connector the connection was successful.
connection = connect(
host=dbutils.secrets.get(scope="resource_group", key="key"),
port=dbutils.secrets.get(scope="resource_group", key="key"),
user=dbutils.secrets.get(scope="resource_group", key="key"),
password=dbutils.secrets.get(scope="resource_group", key="key"),
database="db")
cursor = connection.cursor()
Question is:
Does anyone know why sqlalchemy and pymysql did not work?
Is there only certain python sql connector libraries that work in Databricks and why?
2
Answers
There are various ways to connect to a MySQL database in Spark. The below image summarizes some of common approaches to connect to MySQL using Python as programming language.
Note: For python environment – it’s recommended to use
mysql-connector-python
library.In Databricks Runtime 11.2 and above you can use the MySQL connector natively. The code could be like this: