We are porting some code from SSIS to Python. As part of this project, I’m recreating some packages but I’m having issues with the database access. I’ve managed to query the DB like this:
employees_table = (spark.read
.format("jdbc")
.option("url", "jdbc:sqlserver://dev.database.windows.net:1433;database=Employees;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;")
.option("query", query)
.option("user", username)
.option("password", password)
.load()
)
With this I make API calls, and want to put the results into the DB, but everything I’ve tried causes errors.
df.write.method("append")
.format("jdbc")
.option("url", "jdbc:sqlserver://dev.database.windows.net:1433;database=Employees;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;")
.option("v_SQL_Insert", query)
.option("user", username)
.option("password", password)
.option("query",v_SQL_Insert)
.save()</p>
Gives me the error AttributeError: ‘DataFrame’ object has no attribute ‘write’.
I get the same error using spark.write or if I try creating an actual dataframe, populate it and try to use the write function.
My question is, what is the best way to populate a db table from Python? Is it to create a dataframe and save it to the DB, or create an SQL command? And how do we go about sending that data in?
2
Answers
Are you using Pandas by any chance? If df is not a Spark DataFrame you’ll often see this error, most commonly if it’s in fact a Pandas DataFrame (which, like the error message says, has no attribute ‘write’.)
The Spark JDBC DataWriter tutorial code here works just fine
So the main approach is:
If you don’t have a spark data frame you will get this error
AttributeError: 'DataFrame' object has no attribute 'write'.
Please follow below syntax and steps to solve the above error. I tried to reproduce it in my environment and I got the below results:
Output: