i have a query in postgres which is taking 1 min and a few seconds to execute, found out that it’s due to the order by clause. my understanding is that order by clause is one of the last to execute and it will order the resultSet only. if i remove the order by clause the query gets executed under 1 second. how can i optimize it? note that the resultSet size is 3410 records. is it actually supposed to take 1 min trying to sort 3000 records? or is it going through the whole table? tha table itself has more than 700k records
below is the query
SELECT aggregatet0_.id AS id,
aggregatet0_.pan AS col_1_0,
aggregatet0_.account_number AS col_2_0,
aggregatet0_.local_date AS col_3_0,
aggregatet0_.local_time AS col_4_0,
aggregatet0_.stan AS col_5_0,
aggregatet0_.rrn AS col_6_0,
aggregatet0_.other_ref AS col_7_0,
aggregatet0_.sequence_number AS col_8_0,
aggregatet0_.amount AS col_9_0,
aggregatet0_.transaction_type AS col_10_0,
aggregatet0_.cur_code AS col_11_0,
aggregatet0_.data AS col_12_0,
aggregatet0_.posting_date AS col_13_0,
aggregatet0_.transmission_date_time AS col_14_0,
aggregatet0_.response_code AS col_15_0,
aggregatet0_.issuer AS col_16_0,
aggregatet0_.recon_status AS col_17_0,
aggregatet0_.matched_in AS col_18_0,
aggregatet0_.in_review AS col_19_0,
aggregatet0_.notes AS col_20_0,
aggregatet0_.recon_account AS col_21_0
FROM recon_core.aggregate_transaction aggregatet0_
LEFT OUTER JOIN recon_core.recon_account_type reconaccou1_
ON ( reconaccou1_.id = aggregatet0_.recon_account_type )
WHERE ( Coalesce('MATCHED', NULL, '') = ''
OR aggregatet0_.recon_status IN ( 'MATCHED' ) )
AND ( Coalesce('RAAST', 'Digital', 'POS', 'Branch', 'ATM', '1LINK', NULL,
'') =
''
OR reconaccou1_.account_type IN (
'RAAST', 'Digital', 'POS', 'Branch',
'ATM', '1LINK' ) )
AND ( Coalesce(NULL, NULL, '') = ''
OR aggregatet0_.destination IN ( NULL ) )
AND ( ( NULL IS NULL )
AND ( NULL IS NULL )
OR Extract(hour FROM aggregatet0_.capture_date_time) BETWEEN NULL
AND
NULL )
AND ( Coalesce(NULL, NULL, '') = ''
OR aggregatet0_.recon_account IN ( NULL ) )
AND ( 17001 IN ( 17001 )
OR aggregatet0_.recon_account_id IN ( 17001 ) )
AND aggregatet0_.posting_date = '2023-06-05'
AND ( Coalesce('DEPOSIT', 'FUNDS-TRANSFER-OWN', 'CASH-WITHDRAWAL',
'CASH-WITHDRAWAL-REVERSAL', 'PURCHASE',
'FUNDS-TRANSFER-SWIFT',
'BALANCE-INQUIRY', 'REFUND', 'CASH-WITHDRAWAL-RETRACT',
'BILL-PAYMENT',
'FUNDS-TRANSFER-LOCAL', 'FUNDS-TRANSFER-INTERBANK', NULL, '')
= ''
OR aggregatet0_.transaction_type IN (
'DEPOSIT', 'FUNDS-TRANSFER-OWN',
'CASH-WITHDRAWAL', 'CASH-WITHDRAWAL-REVERSAL',
'PURCHASE',
'FUNDS-TRANSFER-SWIFT',
'BALANCE-INQUIRY', 'REFUND',
'CASH-WITHDRAWAL-RETRACT',
'BILL-PAYMENT'
,
'FUNDS-TRANSFER-LOCAL', 'FUNDS-TRANSFER-INTERBANK' ) )
ORDER BY aggregatet0_.id ASC
LIMIT 15;
if you remove ORDER BY aggregatet0_.id ASC
part, it works fine.
below are the index in the table , all binary tree based
id Primary key
transaction_type
posting_date
UPDATE
providing query plan below
Limit (cost=0.42..2151.51 rows=15 width=2218) (actual time=56156.995..56157.052 rows=15 loops=1)
-> Nested Loop (cost=0.42..200051.50 rows=1395 width=2218) (actual time=56156.993..56157.048 rows=15 loops=1)
Join Filter: (aggregatet0_.recon_account_type = reconaccou1_.id)
Rows Removed by Join Filter: 45
-> Index Scan using pk__aggregat__3213e83f059b2b7d on aggregate_transaction aggregatet0_ (cost=0.42..199876.96 rows=2092 width=2222) (actual time=56156.569..56156.612 rows=15 loops=1)
Filter: (((recon_status)::text = 'MATCHED'::text) AND (posting_date = '2023-06-05'::date) AND ((transaction_type)::text = ANY ('{DEPOSIT,FUNDS-TRANSFER-OWN,CASH-WITHDRAWAL,CASH-WITHDRAWAL-REVERSAL,PURCHASE,FUNDS-TRANSFER-SWIFT,BALANCE-INQUIRY,REFUND,CASH-WITHDRAWAL-RETRACT,BILL-PAYMENT,FUNDS-TRANSFER-LOCAL,FUNDS-TRANSFER-INTERBANK}'::text[])))
Rows Removed by Filter: 752365
-> Materialize (cost=0.00..1.19 rows=6 width=4) (actual time=0.027..0.028 rows=4 loops=15)
-> Seq Scan on recon_account_type reconaccou1_ (cost=0.00..1.16 rows=6 width=4) (actual time=0.401..0.404 rows=4 loops=1)
Filter: ((account_type)::text = ANY ('{RAAST,Digital,POS,Branch,ATM,1LINK}'::text[]))
Rows Removed by Filter: 5
Planning Time: 0.330 ms
Execution Time: 56157.120 ms
2
Answers
Looking into my crystal ball, I’d say that you should use
My guess is that the slow plan tries to scan the table in
ORDER BY
order using an index and using a nested loop join to maintain the order until it finds enough rows. Unfortunately, there are way fewer rows than PostgreSQL guessed. Using the modifiedORDER BY
clause keeps PostgreSQL from using the index.Anything that tries to summarize query execution into one sentence is going to be wrong. In this case, the order is achieved at the very beginning of the query, by reading an index in index order, and is then just maintained throughout the query. It is not executed at the end, and is not really executed at all, it is just an inherit property of a btree index scan.
Finding the 15 tallest people in the Chicago is obviously a lot harder than finding 15 random people in Chicago, and then sorting that 15 by height. Without the ORDER BY but keeping the LIMIT, you are doing the 2nd of those things. It gives no indication of how long it should/will take to do the first of those things
You would probably benefit substantially by an index on
(recon_status, posting_date, id)
. It is hard to say how much you would benefit, because we don’t know how much of the selectivity is due to the transaction_type condition versus the recon_status and posting_date conditions. The EXPLAIN ANALYZE doesn’t give us that level of detail.