I have use case where after inserting a row in mysql table i want to fetch back the id of the last inserted row. I wanted to run a MYSQL query in aws glue for the same. I am new to pyspark and wasn’t able to find a whole lot for the same.Here is the code to connect to db.
My custom query
INSERT INTO users (name,email) VALUES ('abc1234','[email protected]');
SELECT LAST_INSERT_ID();
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
# Create a SparkContext and GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
# Define your RDS database endpoint, username, and password
source_host = "xxx"
source_db = "xxx"
source_port = "3306"
source_driver = "jdbc"
source_db_type = "mysql"
source_user = "xxx"
source_password = "xxx"
# Define the JDBC URL
source_connection_url = f'{source_driver}:{source_db_type}://{source_host}:{source_port}/{source_db}'
source_table = "users"
source_connection_mysql5_options = {
"url": source_connection_url,
"dbtable": source_table,
"user": source_user,
"password": source_password
}
source_dynamic_frame = glueContext.create_dynamic_frame.from_options(
connection_type="mysql",connection_options=source_connection_mysql5_options)
source_dynamic_frame.printSchema().
2
Answers
with mysql ucan create id with AUTO_INCREMENT
And then you can use filter() of dynamicframe:
Solution 1:
check this : Define connection options
you can add
sampleQuery
attribute where you can add custom query.Solution 2:(to be checked)
you can a specify your query in
dbtable
for exampleSolution 3:
Create Mysql View based on your custom query that you will use as source in Glue