HASH indexes in PostgreSQL

Hans
October 23, 2024
93 views
0 votes
2 Answers

I’m revisiting an old question from 16 years ago about PostgreSQL HASH indexes vs B-TREEs. At the time, it seemed like HASH indexes were far less efficient than B-TREEs in terms of both space and time and the consensus was that HASH indexes offered no significant performance advantage in many use cases.

I’m curious if the situation has changed in 2024?
Comment to this answer (and the linked doc) suggests that there have been some optimizations to HASH indexes in PostgreSQL that make them viable. Are they significant?

Main question:

What are the current best practices regarding when to use a HASH index over a B-TREE in Postgres?

Additional considerations:

Are there specific pitfalls to be aware of with HASH indexes, particularly on:

BOOL columns?
Columns with very few unique values (e.g., HASH indexes where only 3 distinct values exist)?

One of the comments in the original thread stated, "In fact, using the index for something that’d match 99% of rows would be massively inefficient, much slower than seqscan." Does this still hold true today, or has the landscape shifted with modern optimizations?

Would love to hear about your experiences and advice regarding HASH indexes in PostgreSQL in 2024!

Answers

- LaurenzAlbe
- October 22, 2024 at 10:36 pm
- 0 votes
0
The one thing that has changed with hash indexes is that they have become crash safe with PostgreSQL v10. Now you don’t have to rebuild the index after an operating system crash.

Other than that, nothing much has changed. Hash indexes continue to receive less love than B-tree indexes, so they usually won’t outperform B-tree indexes. If you have a column with many identical values, they are necessarily inefficient, but during my experiments in 2023 I didn’t find any realistic case where hash indexes outperformed B-tree indexes.

My advice is to forget about hash indexes.

Login or Signup to reply.

- jjanes
- October 23, 2024 at 3:51 am
- 0 votes
0
Hash indexes have had some improvements (WAL logging/crash safety added, improvement in locking to be more performant under concurrency than they were before), but nothing that would make then generally superior to btree. The improvements have been from "way worse than btree" to "now only slightly worse".

The only case where I would consider them is where the value being indexed is quite large. Values longer than about 2692 bytes will fail altogether when inserted into btree, while long values which are shorter than that will succeed but generate quite large indexes which might be less cache/IO friendly.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.