I use MWAA to run my SQL jobs, which are executed on RDS MySQL Community. For example, I start with job ‘A’, and once it’s completed, I proceed to run job ‘B’.
In job ‘A’, I use insert SQL to insert data into ‘a_table’. After completion, job ‘B’ SQL queries the data inserted into ‘a_table’ by job ‘A’ and then inserts it into ‘b_table’.
Currently, I’ve observed that although job ‘A’ has completed and committed, for instance, inserting 10 records into ‘a_table’, logically job ‘B’ should retrieve those 10 records and insert them into ‘b_table’.
However, sometimes it fails to retrieve all records; it might only fetch 8 records and insert them into ‘b_table’. Yet, if I manually rerun job ‘B’ for the same DAG run later on, it successfully retrieves all 10 records from ‘a_table’.
So, I’m quite certain it’s not an SQL issue but rather a problem occurring when running job ‘B’, where at times, not all 10 records from ‘a_table’ are written into ‘a_table’, even though job ‘A’ has completed successfully.
a job sql:
INSERT INTO a_table (
DAG_START_TIME
, DAG_END_TIME
, A
, B
, C
, UPDATE_TIME
)
SELECT
%(start)s AS DAG_START_TIME
, %(end)s AS DAG_END_TIME
, A
, B
, C
, UPDATE_TIME
FROM ABC
WHERE `DAG_START_TIME` = %(start)s
ON DUPLICATE KEY UPDATE
`UPDATE_TIME` = CURRENT_TIMESTAMP(6)
;
b job sql:
INSERT INTO b_table (
DAG_START_TIME
, DAG_END_TIME
, A
, B
, C
, UPDATE_TIME
)
SELECT
%(start)s AS DAG_START_TIME
, %(end)s AS DAG_END_TIME
, A
, B
, C
, UPDATE_TIME
FROM a_table
WHERE `DAG_START_TIME` = %(start)s
;
I’ve checked RDS CPU Utilization and Write Latency, both of which are well below threshold levels.
Is there any advise on how to address this issue? It will help me a lot.
2
Answers
Asynchronous behavior of committed transactions can may cause delayed visibility and job B might not see all the records immediately. still make sure that job A is fully committed before B starts
You can add a delay or wait between A and B to make sure the data committed by A is fully visible by B
You can also increase your transaction isolation level. the default is usually
REPEATABLE READ
, you might want to tryREAD COMMITTED
If you want to take it steps further you can consider:
innodb_flush_log_at_trx_commit
orsync_binlog
can influence transaction durability and visibility]To be acid compliant, in your RDS Parameter Group,
The BEST always and sometimes you may find the recording of the data is not
instantaneous. MySQL has many things to accomplish in every second of uptime.