Postgresql Select query optimization using Order by clause

FaheemSultan
June 13, 2023
244 views
0 votes
2 Answers

i have a query in postgres which is taking 1 min and a few seconds to execute, found out that it’s due to the order by clause. my understanding is that order by clause is one of the last to execute and it will order the resultSet only. if i remove the order by clause the query gets executed under 1 second. how can i optimize it? note that the resultSet size is 3410 records. is it actually supposed to take 1 min trying to sort 3000 records? or is it going through the whole table? tha table itself has more than 700k records

below is the query

    SELECT aggregatet0_.id                     AS id,
       aggregatet0_.pan                    AS col_1_0,
       aggregatet0_.account_number         AS col_2_0,
       aggregatet0_.local_date             AS col_3_0,
       aggregatet0_.local_time             AS col_4_0,
       aggregatet0_.stan                   AS col_5_0,
       aggregatet0_.rrn                    AS col_6_0,
       aggregatet0_.other_ref              AS col_7_0,
       aggregatet0_.sequence_number        AS col_8_0,
       aggregatet0_.amount                 AS col_9_0,
       aggregatet0_.transaction_type       AS col_10_0,
       aggregatet0_.cur_code               AS col_11_0,
       aggregatet0_.data                   AS col_12_0,
       aggregatet0_.posting_date           AS col_13_0,
       aggregatet0_.transmission_date_time AS col_14_0,
       aggregatet0_.response_code          AS col_15_0,
       aggregatet0_.issuer                 AS col_16_0,
       aggregatet0_.recon_status           AS col_17_0,
       aggregatet0_.matched_in             AS col_18_0,
       aggregatet0_.in_review              AS col_19_0,
       aggregatet0_.notes                  AS col_20_0,
       aggregatet0_.recon_account          AS col_21_0
FROM   recon_core.aggregate_transaction aggregatet0_
       LEFT OUTER JOIN recon_core.recon_account_type reconaccou1_
                    ON ( reconaccou1_.id = aggregatet0_.recon_account_type )
WHERE  ( Coalesce('MATCHED', NULL, '') = ''
          OR aggregatet0_.recon_status IN ( 'MATCHED' ) )
       AND ( Coalesce('RAAST', 'Digital', 'POS', 'Branch', 'ATM', '1LINK', NULL,
             '') =
             ''
              OR reconaccou1_.account_type IN (
                 'RAAST', 'Digital', 'POS', 'Branch',
                 'ATM', '1LINK' ) )
       AND ( Coalesce(NULL, NULL, '') = ''
              OR aggregatet0_.destination IN ( NULL ) )
       AND ( ( NULL IS NULL )
             AND ( NULL IS NULL )
              OR Extract(hour FROM aggregatet0_.capture_date_time) BETWEEN NULL
                 AND
                 NULL )
       AND ( Coalesce(NULL, NULL, '') = ''
              OR aggregatet0_.recon_account IN ( NULL ) )
       AND ( 17001 IN ( 17001 )
              OR aggregatet0_.recon_account_id IN ( 17001 ) )
       AND aggregatet0_.posting_date = '2023-06-05'
       AND ( Coalesce('DEPOSIT', 'FUNDS-TRANSFER-OWN', 'CASH-WITHDRAWAL',
                   'CASH-WITHDRAWAL-REVERSAL', 'PURCHASE',
             'FUNDS-TRANSFER-SWIFT',
                   'BALANCE-INQUIRY', 'REFUND', 'CASH-WITHDRAWAL-RETRACT',
             'BILL-PAYMENT',
                   'FUNDS-TRANSFER-LOCAL', 'FUNDS-TRANSFER-INTERBANK', NULL, '')
             = ''
              OR aggregatet0_.transaction_type IN (
                     'DEPOSIT', 'FUNDS-TRANSFER-OWN',
                     'CASH-WITHDRAWAL', 'CASH-WITHDRAWAL-REVERSAL',
                                                    'PURCHASE',
                     'FUNDS-TRANSFER-SWIFT',
                     'BALANCE-INQUIRY', 'REFUND',
                                                    'CASH-WITHDRAWAL-RETRACT',
                     'BILL-PAYMENT'
                     ,
                     'FUNDS-TRANSFER-LOCAL', 'FUNDS-TRANSFER-INTERBANK' ) )
ORDER  BY aggregatet0_.id ASC
LIMIT  15;

if you remove ORDER BY aggregatet0_.id ASC part, it works fine.

below are the index in the table , all binary tree based

id Primary key
transaction_type
posting_date

UPDATE

providing query plan below

 Limit  (cost=0.42..2151.51 rows=15 width=2218) (actual time=56156.995..56157.052 rows=15 loops=1)
  ->  Nested Loop  (cost=0.42..200051.50 rows=1395 width=2218) (actual time=56156.993..56157.048 rows=15 loops=1)
        Join Filter: (aggregatet0_.recon_account_type = reconaccou1_.id)
        Rows Removed by Join Filter: 45
        ->  Index Scan using pk__aggregat__3213e83f059b2b7d on aggregate_transaction aggregatet0_  (cost=0.42..199876.96 rows=2092 width=2222) (actual time=56156.569..56156.612 rows=15 loops=1)
              Filter: (((recon_status)::text = 'MATCHED'::text) AND (posting_date = '2023-06-05'::date) AND ((transaction_type)::text = ANY ('{DEPOSIT,FUNDS-TRANSFER-OWN,CASH-WITHDRAWAL,CASH-WITHDRAWAL-REVERSAL,PURCHASE,FUNDS-TRANSFER-SWIFT,BALANCE-INQUIRY,REFUND,CASH-WITHDRAWAL-RETRACT,BILL-PAYMENT,FUNDS-TRANSFER-LOCAL,FUNDS-TRANSFER-INTERBANK}'::text[])))
              Rows Removed by Filter: 752365
        ->  Materialize  (cost=0.00..1.19 rows=6 width=4) (actual time=0.027..0.028 rows=4 loops=15)
              ->  Seq Scan on recon_account_type reconaccou1_  (cost=0.00..1.16 rows=6 width=4) (actual time=0.401..0.404 rows=4 loops=1)
                    Filter: ((account_type)::text = ANY ('{RAAST,Digital,POS,Branch,ATM,1LINK}'::text[]))
                    Rows Removed by Filter: 5
Planning Time: 0.330 ms
Execution Time: 56157.120 ms

Answers

- LaurenzAlbe
- June 7, 2023 at 3:49 pm
- 0 votes
0
Looking into my crystal ball, I’d say that you should use
```
ORDER BY aggregatet0_.id + 0
```
My guess is that the slow plan tries to scan the table in ORDER BY order using an index and using a nested loop join to maintain the order until it finds enough rows. Unfortunately, there are way fewer rows than PostgreSQL guessed. Using the modified ORDER BY clause keeps PostgreSQL from using the index.
Login or Signup to reply.

- jjanes
- June 7, 2023 at 8:40 pm
- 0 votes
0
my understanding is that order by clause is one of the last to execute and it will order the resultSet only

Anything that tries to summarize query execution into one sentence is going to be wrong. In this case, the order is achieved at the very beginning of the query, by reading an index in index order, and is then just maintained throughout the query. It is not executed at the end, and is not really executed at all, it is just an inherit property of a btree index scan.

if i remove the order by clause the query gets executed under 1 second.

Finding the 15 tallest people in the Chicago is obviously a lot harder than finding 15 random people in Chicago, and then sorting that 15 by height. Without the ORDER BY but keeping the LIMIT, you are doing the 2nd of those things. It gives no indication of how long it should/will take to do the first of those things

You would probably benefit substantially by an index on (recon_status, posting_date, id). It is hard to say how much you would benefit, because we don’t know how much of the selectivity is due to the transaction_type condition versus the recon_status and posting_date conditions. The EXPLAIN ANALYZE doesn’t give us that level of detail.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.