SQL Alchemy supports multiple ways of specifying datatypes for SQL database columns. If regular Python datatypes are used, SQL Alchemy will try to sensibly map these to datatypes supported by the connected to database.
There is also the possibility of using more precicely defined types which are supplied by the SQL Alchemy framework. These types more closely match to what types are supported by specific databases. (For example Postgres, MySQL, sqlite3, …)
This seems slightly strange, since to generate a md5 hash, typically the Python code would look something like this
hashlib.md5(input_string.encode('utf-8')).hexdigest()
or this
hashlib.md5(input_string.encode('utf-8')).digest()
depending if the hash value is wanted as a str
or a binary string (bytes
).
Neither of these is exactly a UUID
, however the str
form appears to be implicitly convertable to and from a UUID
type.
- Is
UUID
the correct datatype to use? - What type should be used for a SHA-256 hash?
I have a feeling that in both cases, a fixed width binary data type would be better. However, this doesn’t appear to be supported by SQL Alchemy, at least not in combination with Postgres.
This is because the LargeBinary
SQL Alchemy datatype maps to bytea
Postgres datatype. This is a variable length data type, and it cannot be forced to be of a fixed width in the same way that varchar
can.
I have done some testing between UUID
and str
and I found that inserting data was slightly faster when using UUID
, which may not be surprising since the length is fixed rather than variable. However, the difference was small. (233 vs 250 messages inserted / second.)
I intend to do further testing using the bytes
datatype. My initial results suggest that it performs the same as str
.
2
Answers
I've come to the conclusion that a regular
varchar
is probably the best solution.There are two reasons.
varchar
datatypes. This means for anmd5
you can set the length to be 32 characters (16 bytes due to hexadecimal encoding). Something which doesn't appear to be possible with Postgresbytea
datatype. (The situation might be different for other SQL databases.)For storing MD5 and SHA-256 hashes in SQLAlchemy and PostgreSQL, the most suitable solution is to use the BYTEA type (which is represented as LargeBinary in SQLAlchemy), but with a length constraint. Although BYTEA itself is a variable-length type, PostgreSQL allows imposing a length restriction on the column.