I am investigating a performance issue with a Spring Boot application that uses Spring Batch 4.2.0 and Postgres 14.5 and ran into an interesting issue.
One of the top queries is the following. Explain plan shows that it executes a FTS(actually two).
The query Postgres side:
SELECT SE.STEP_EXECUTION_ID, SE.STEP_NAME, SE.START_TIME, SE.END_TIME, SE.STATUS, SE.COMMIT_COUNT, SE.READ_COUNT, SE.FILTER_COUNT, SE.WRITE_COUNT,
SE.EXIT_CODE, SE.EXIT_MESSAGE, SE.READ_SKIP_COUNT, SE.WRITE_SKIP_COUNT, SE.PROCESS_SKIP_COUNT, SE.ROLLBACK_COUNT, SE.LAST_UPDATED, SE.VERSION,
JE.JOB_EXECUTION_ID, JE.START_TIME, JE.END_TIME, JE.STATUS, JE.EXIT_CODE, JE.EXIT_MESSAGE, JE.CREATE_TIME, JE.LAST_UPDATED, JE.VERSION
from N_BATCH_JOB_EXECUTION JE, N_BATCH_STEP_EXECUTION SE
where
SE.JOB_EXECUTION_ID in
(SELECT JOB_EXECUTION_ID
from N_BATCH_JOB_EXECUTION
where JE.JOB_INSTANCE_ID = $1)
and SE.JOB_EXECUTION_ID = JE.JOB_EXECUTION_ID
and SE.STEP_NAME = $2
order by SE.START_TIME desc, SE.STEP_EXECUTION_ID desc
The explain plan:
Sort (cost=154.08..154.11 rows=13 width=217)
Sort Key: se.start_time DESC, se.step_execution_id DESC
-> Hash Join (cost=69.71..153.83 rows=13 width=217)
Hash Cond: (se.job_execution_id = je.job_execution_id)
Join Filter: (SubPlan 1)
-> Bitmap Heap Scan on n_batch_step_execution se (cost=4.49..88.54 rows=26 width=156)
Recheck Cond: ((step_name)::text = 'notf0270f8e50fd95b44da8b34cb1ea829cc74'::text)
-> Bitmap Index Scan on n_batch_step_exec_step_name_job_exec_id_idx (cost=0.00..4.48 rows=26 width=0)
Index Cond: ((step_name)::text = 'notf0270f8e50fd95b44da8b34cb1ea829cc74'::text)
-> Hash (cost=45.10..45.10 rows=1610 width=77)
-> Seq Scan on n_batch_job_execution je (cost=0.00..45.10 rows=1610 width=77)
SubPlan 1
-> Result (cost=0.00..45.10 rows=1610 width=8)
One-Time Filter: (je.job_instance_id = 1)
-> Seq Scan on n_batch_job_execution (cost=0.00..45.10 rows=1610 width=8)
This query didn’t seem quite right to me, tried to make it better, got something and decided to check the original. Found the query in Spring Batch source code(JdbcStepExecutionDao.java) and seems correct.
But it’s slightly different, enough to produce a different explain plan. Basically the subquery is not using an alias defined in the outer query(where JOB_INSTANCE_ID =… instead of **JE.**JOB_INSTANCE_ID = …)
where
SE.JOB_EXECUTION_ID in
(SELECT JOB_EXECUTION_ID
from N_BATCH_JOB_EXECUTION
where JOB_INSTANCE_ID = $1)
Explain plan looks much better:
Sort (cost=17.74..17.75 rows=1 width=217)
Sort Key: se.start_time DESC, se.step_execution_id DESC
-> Nested Loop (cost=0.59..17.73 rows=1 width=217)
-> Nested Loop (cost=0.30..16.35 rows=1 width=77)
-> Index Scan using n_batch_job_execution_job_instance_id_idx on n_batch_job_execution (cost=0.15..8.17 rows=1 width=8)
Index Cond: (job_instance_id = 1)
-> Index Scan using n_batch_job_execution_pkey on n_batch_job_execution je (cost=0.15..8.17 rows=1 width=69)
Index Cond: (job_execution_id = n_batch_job_execution.job_execution_id)
-> Index Scan using n_batch_step_exe_idx on n_batch_step_execution se (cost=0.29..1.37 rows=1 width=156)
Index Cond: (job_execution_id = je.job_execution_id)
Filter: ((step_name)::text = 'notf0270f8e50fd95b44da8b34cb1ea829cc74'::text)
The question I have is how did that query get re-written like that? Perhaps Postgres server does that, maybe the JDBC driver or some other layer?
2
Answers
thank you, Mahmoud, I agree with you. The explanation is quite simple in the end, 4.2.0.RELEASE contains the query that has this performance issue, and that's what the application uses, but I've checked the query in 4.2.x tree, my mistake. It looks like you've fixed the query with following commit: https://github.com/spring-projects/spring-batch/commit/47a6f019e3c657e1cc4625a9c5d142a4601c8907?diff=split
Spring Batch does not re-write queries. The query you see in the code is the one that is sent to your database.