Some of my collegues claimed: Postgres use a simple index little bit less than a converted to text one…
CREATE INDEX simple_index ON my_table(my_varchar);
CREATE INDEX text_index ON my_table((my_varchar::text));
And I wonder that the analyze always convert it to text why?
explain analyze select * from my_table where my_varchar = 'WONDERFUL_VARCHAR';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Index Scan using simple_index on my_table (cost=0.42..8.92 rows=7 width=1192) (actual time=0.106..0.112 rows=4 loops=1)
Index Cond: ((my_varchar)::text = 'WONDERFUL_VARCHAR'::text)
Planning Time: 0.287 ms
Execution Time: 0.298 ms
I probed it an postgres16 and postgres can use simple varchar index, if I cast the constant variable to text or not.
What is the difference of two indexes? Is there some cases, when text converted index will use postgres and the simple one not?
2
Answers
Casting between
text
andcharacter varying
is a no-operation, since the data types are the same (with the exception of the length check incharacter varying
). You can see that by looking atpg_cast
:Cast method
b
So any cast between these data types is just a decoration to keep the strict type system happy; it does not incur any overhead.
It does not matter at all which of the two index definitions you use.
The
text_index
definition isn’t valid because you need to duplicate the parentheses to use an expression in that context:This would get you an error:
There is a very subtle difference between the two and a case to be made against what your colleagues suggest. You might be accidentally disabling index-only scans:
demo at db<>fiddle
If you take away the simple, leaving just the expression-based
text_index
, this gets downgraded from Index-Only to a regular Index Scan:Postgres doesn’t rely on the index set up on an expression to allow extracting the underlying value from it so even if you’re only fetching
my_varchar
based exclusively on themy_varchar
and nothing else, you’ll see it jump to the heap anyways even though it should be able skip that and give you the value it just found in the index, saving a potentially expensive, needless additional step.That’s unless you explicitly
include
the raw value as payload:Which costs you additional space and makes things slower because a larger index is slower to search. Still, reading the value right from the index you had to read anyways, beats having to do an additional jump from that to the table.
If you rarely ever do a narrow select like this, it shouldn’t matter much because the presence of any other, not indexed and not
include
d field will require the jump from the index to the table to get the missing values. Looking at the1192
width of your rows, you do have more stuff in there, so it might be unlikely you ever saw it do an Index-Only and fitting them all as included payload might exceed the limit.It might as well fit right under it: here you can see even a slightly wider table getting entirely covered. Worth a shot if it’s fairly static.
If you do really follow the anti-pattern of
select*
(without explicitly listing field names you need, in the order you need, which is doomed to break as soon as someone adds, swaps, reorders something), consider listing field names explicitly instead, then see if you really need all the columns and if not, whether you could set up indexes covering just the ones you do need. If you can afford the space and some overhead required for maintaining the index, there’s some performance to be gained there.