Postgresql - Slow query when sorting large data

dafie
August 9, 2023
179 views
0 votes
2 Answers

This query:

select h.id
from history h
where
    h.processed = true
order by coalesce (h.last_process_attempt_time, h.delivery_time)
limit 5

executes fast (6ms), because almost every row has processed = true. But when I change to this:

explain analyze 
select h.id
from pgz75008.history h
where
    h.processed = false -- the only change is here
ORDER BY coalesce (h.last_process_attempt_time, h.delivery_time)

limit 5

now execution takes over 1.5 seconds (260x times more than previously). This is my index:

CREATE INDEX test_idx ON history USING btree (processed, last_process_attempt_time, delivery_time);

First query plan:

Limit  (cost=18242.80..18242.81 rows=5 width=45) (actual time=6.866..6.869 rows=5 loops=1)
  ->  Sort  (cost=18242.80..18468.04 rows=90095 width=45) (actual time=6.863..6.864 rows=5 loops=1)
        Sort Key: (COALESCE(last_process_attempt_time, delivery_time))
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Index Scan using test_idx on history h  (cost=0.43..16746.36 rows=90095 width=45) (actual time=0.034..6.601 rows=1149 loops=1)
              Index Cond: (processed = false)
Planning Time: 0.309 ms
Execution Time: 6.904 ms

second query plan:

Limit  (cost=331050.32..331050.92 rows=5 width=45) (actual time=1497.691..1631.789 rows=5 loops=1)
  ->  Gather Merge  (cost=331050.32..1127312.50 rows=6650224 width=45) (actual time=1484.761..1618.855 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        ->  Sort  (cost=330050.26..334206.65 rows=1662556 width=45) (actual time=1420.135..1420.138 rows=4 loops=5)
              Sort Key: (COALESCE(last_process_attempt_time, delivery_time))
              Sort Method: top-N heapsort  Memory: 25kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
              Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
              Worker 2:  Sort Method: top-N heapsort  Memory: 25kB
              Worker 3:  Sort Method: top-N heapsort  Memory: 25kB
              ->  Parallel Seq Scan on history h  (cost=0.00..302435.80 rows=1662556 width=45) (actual time=11.190..1000.568 rows=1347834 loops=5)
                    Filter: processed
                    Rows Removed by Filter: 230
Planning Time: 0.233 ms
JIT:
  Functions: 21
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 8.992 ms, Inlining 0.000 ms, Optimization 5.794 ms, Emission 61.213 ms, Total 75.999 ms
Execution Time: 1634.044 ms

Any ideas how can I make it faster?

Tags: postgresql sql

Answers

- ShakirovRamil
- August 9, 2023 at 3:57 pm
- 0 votes
0
What if
```
select h.id
from history h
where id NOT IN (SELECT ID FROM history h2 where h2.processed = true)
order by coalesce (h.last_process_attempt_time, h.delivery_time)
limit 5
```
Login or Signup to reply.

- FrankHeikens
- August 9, 2023 at 5:26 pm
- 0 votes
0
Booleans are difficult to be useful in an index, since you have only three options: null, true and false. Pretty easy to be useless.

What if you create an index like this:
```
CREATE INDEX idx_last_process_attempt_time_delivery_time
ON history (
  COALESCE (h.last_process_attempt_time, h.delivery_time)
);
```
Maybe it’s useful to include the processed column, like this:
```
CREATE INDEX idx_last_process_attempt_time_delivery_time
ON history (
  COALESCE (h.last_process_attempt_time, h.delivery_time)
)
INCLUDE (processed);
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Slow query when sorting large data

Answers