I have an application which deals with blockchain data, so the postgres tables have quite a lot of fields which are constant sized hex values.
One of the most important columns is the public_address
column of the user
table, which is a 20 bytes hex value and the primary key of the table. The reason I say important because this column is used in almost all other queries within a JOIN or WHERE clause, so it’s indexing and query-speed is crucial.
On the other hand there is another column nonce
in transaction
table, which is a 32 bytes hex value with a single constraint NOT NULL
on it. This is not used in any JOIN or WHERE clauses, and only used in SELECT statements to get the value of the column.
I am aware that bytea
is storage-wise more efficient than varchar
, but I don’t know which one is better to use for indexes and for faster queries.
So my question is, under which circumstances bytea
should be used over varchar(N)
for hex values, and vice-verse, taking into account the indexing and query-speed?
2
Answers
What you call a "hex value" is just a text representation of binary data. And that is what matters. It’s plain, raw, arbitrary, binary data. Just a sequence of bytes.
Indexing and query speed (while I don’t expect them to be much different) do not matter, there is only one appropriate data type for arbitrary byte sequences:
bytea
.Do not use
varchar
: "The characters that can be stored in any of these data types are determined by the database character set, which is selected when the database is created. Regardless of the specific character set, the character with code zero (sometimes called NUL) cannot be stored."tl;dr. Use
bytea
. It will likely be more performant, but more importantly it’s correct; there are some bytes which are illegal forvarchar
. From the docs…varchar
stores bytes which represent characters using a given character encoding. In many character encodings, certain byte combinations are illegal; they represent no character. For example, in UTF-8 a single byte above 7F is illegal. So you can’t reliably store bytes as bytes invarchar
.Instead,
varchar
will store a string representation of the bytes. Sox00AA
will be stored not as the two bytes 00 AA but as the string"x00AA"
which, in UTF-8, is 5C 78 30 30 41 41.Demonstration.
Converting back and forth between the hex characters and the real bytes is more work and more opportunities for mistakes. It’s more efficient and more correct to use bytea.