I am running a simple query to get the max ID in a table:
SELECT max(ID) FROM t WHERE m=345;
Table (t) has 20 million records and 2000 distinct values for m.
There is a primary key index on ID and an index on m.
For some reason the explain plan shows an "Index Scan Backward" on the pk index rather than scanning the index on m.
The query is taking over 10 seconds to complete.
If I prevent the use of the pk index by changing the SQL ( SELECT max(ID+0) FROM t WHERE m=345; )
it takes just a few milliseconds to complete.
We do a regular vacuum/analyze on the table.
I’d prefer not to add "+0" to all of the queries to solve this issue.
I could probably rewrite this SQL in other ways and get a better result but the original SQL is so simple the optimizer should be able to figure out the best plan.
Table and Index DDL:
CREATE TABLE IF NOT EXISTS t
(
id bigint NOT NULL DEFAULT nextval('t_seq'::regclass),
c integer,
m integer,
b integer DEFAULT '-1'::integer,
p integer,
CONSTRAINT t_pkey PRIMARY KEY (id)
)
TABLESPACE pg_default;
CREATE INDEX t_m_index
ON t USING btree
(m ASC NULLS LAST)
TABLESPACE pg_default;
explain plan with workaround and original:
db=> explain (analyze, buffers, format text)
db-> select MAX(id+0) FROM t WHERE m=345;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=19418.26..19418.27 rows=1 width=8) (actual time=0.047..0.047 rows=1 loops=1)
Buffers: shared hit=6 read=2
-> Bitmap Heap Scan on t (cost=211.19..19368.82 rows=9888 width=8) (actual time=0.039..0.042 rows=2 loops=1)
Recheck Cond: (m = 345)
Heap Blocks: exact=2
Buffers: shared hit=6 read=2
-> Bitmap Index Scan on t_m_index (cost=0.00..208.72 rows=9888 width=0) (actual time=0.033..0.034 rows=2 loops=1)
Index Cond: (m = 345)
Buffers: shared hit=4 read=2
Planning Time: 0.898 ms
Execution Time: 0.094 ms
(11 rows)
db=> explain (analyze, buffers, format text)
db-> select MAX(id) FROM t WHERE m=345;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=464.31..464.32 rows=1 width=8) (actual time=21627.948..21627.950 rows=1 loops=1)
Buffers: shared hit=10978859 read=124309 dirtied=1584
InitPlan 1 (returns $0)
-> Limit (cost=0.56..464.31 rows=1 width=8) (actual time=21627.945..21627.946 rows=1 loops=1)
Buffers: shared hit=10978859 read=124309 dirtied=1584
-> Index Scan Backward using t_pkey on t (cost=0.56..4524305.43 rows=9756 width=8) (actual time=21627.944..21627.944 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Filter: (m = 345)
Rows Removed by Filter: 11745974
Buffers: shared hit=10978859 read=124309 dirtied=1584
Planning Time: 0.582 ms
Execution Time: 21627.964 ms
(12 rows)
2
Answers
It looks to me like your data has a skewed distribution which is causing problems here. You mention in the question:
If the database were to assume an equal distribution of values, that would imply that each value of
m
would have10,000
values (20,000,000 / 2,000
). But, again assuming equal distribution of data, the optimizer could expect to a record wherem = 345
after "randomly" examining a maximum of2,000
records when walking thet_pkey
index. This means that the optimizer has to decide which of the two approaches is likely to be less work:t_pkey
index in descending order ofID
, filtering out results until it finds a record wherem = 345
. Once found it is done processing and may return the result to the client.t_m_index
with entries matchingm = 345
. It must wade through all of these results to identify the maximumID
value.Based on our estimates above, the optimizer might guess that the (vast) majority of the cost of the first plan would be scanning ~
2,000
index keys and records, while the cost of the second plan would be in scanning ~10,000
index keys and records plus the cost of aggregating those to find the entry with the maximum value. From that perspective the decision to select the former plan seems reasonable.The problem is that your data is not uniformly distributed, at least not in terms of ordering based on
ID
in thet_pkey
index. From the second explain plan (which uses that index and represents the first of the two plans I mentioned above) we see:So the database ended up having to scan through more than half of the data (11.7M rows) in descending order of
ID
before it found the first record wherem = 345
. This explains why the operation is so slow. But remember that the plan was chosen because it expected to find a value much sooner, so this poor outcome was a surprise to the database.One potential thing to consider here would be to create a multicolumn index on both
m
andID
. This would allow the database to retrieve the requested data after examining just a single index key. It would be most appropriate to consider this approach if this is an operation that you will be running frequently.Try a BTREE compound index on
(m, id DESC)
if you need your query to exploit ah index. To satisfy the query, PG random-accesses the index to the first row matchingm=345
. Then, in that index entry it finds the largestid
value and Bob’s your uncle. It’s O(log n).Change your index to