skip to Main Content

I have use case where after inserting a row in mysql table i want to fetch back the id of the last inserted row. I wanted to run a MYSQL query in aws glue for the same. I am new to pyspark and wasn’t able to find a whole lot for the same.Here is the code to connect to db.
My custom query

INSERT INTO users (name,email) VALUES ('abc1234','[email protected]');
SELECT LAST_INSERT_ID();
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

# Create a SparkContext and GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

# Define your RDS database endpoint, username, and password
source_host = "xxx"
source_db = "xxx"
source_port = "3306"
source_driver = "jdbc"
source_db_type = "mysql"
source_user = "xxx"
source_password = "xxx"

# Define the JDBC URL
source_connection_url = f'{source_driver}:{source_db_type}://{source_host}:{source_port}/{source_db}'
source_table = "users"
source_connection_mysql5_options = {
    "url": source_connection_url,
    "dbtable": source_table,
    "user": source_user,
    "password": source_password
}
source_dynamic_frame = glueContext.create_dynamic_frame.from_options(
                        connection_type="mysql",connection_options=source_connection_mysql5_options)
source_dynamic_frame.printSchema().

2

Answers


  1. with mysql ucan create id with AUTO_INCREMENT

    CREATE TABLE table_name (
     id MEDIUMINT NOT NULL AUTO_INCREMENT,
     name CHAR(30) NOT NULL,
     PRIMARY KEY (id));
    

    And then you can use filter() of dynamicframe:

    # Filter records based on the maximum value of a column
    max_value = source_dynamic_frame.max('id')
    filtered_dynamic_frame = dynamic_frame.filter(lambda x: x['id'] == max_value)
    
    Login or Signup to reply.
  2. Solution 1:

    check this : Define connection options

    you can add sampleQuery attribute where you can add custom query.

    Solution 2:(to be checked)

    you can a specify your query in dbtable for example

    source_connection_mysql_options = {
      "url": source_connection_url,
      "dbtable": "(SELECT LAST_INSERT_ID()) AS lastInserted",
      "user": source_user,
      "password": source_password
    }
    

    Solution 3:

    Create Mysql View based on your custom query that you will use as source in Glue

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search