skip to Main Content

SQL Alchemy supports multiple ways of specifying datatypes for SQL database columns. If regular Python datatypes are used, SQL Alchemy will try to sensibly map these to datatypes supported by the connected to database.

There is also the possibility of using more precicely defined types which are supplied by the SQL Alchemy framework. These types more closely match to what types are supported by specific databases. (For example Postgres, MySQL, sqlite3, …)

This question and answer seems to suggest that the UUID type should be used to represent MD5 hash values.

This seems slightly strange, since to generate a md5 hash, typically the Python code would look something like this

hashlib.md5(input_string.encode('utf-8')).hexdigest()

or this

hashlib.md5(input_string.encode('utf-8')).digest()

depending if the hash value is wanted as a str or a binary string (bytes).

Neither of these is exactly a UUID, however the str form appears to be implicitly convertable to and from a UUID type.

  • Is UUID the correct datatype to use?
  • What type should be used for a SHA-256 hash?

I have a feeling that in both cases, a fixed width binary data type would be better. However, this doesn’t appear to be supported by SQL Alchemy, at least not in combination with Postgres.

This is because the LargeBinary SQL Alchemy datatype maps to bytea Postgres datatype. This is a variable length data type, and it cannot be forced to be of a fixed width in the same way that varchar can.

I have done some testing between UUID and str and I found that inserting data was slightly faster when using UUID, which may not be surprising since the length is fixed rather than variable. However, the difference was small. (233 vs 250 messages inserted / second.)

I intend to do further testing using the bytes datatype. My initial results suggest that it performs the same as str.

2

Answers


  1. Chosen as BEST ANSWER

    I've come to the conclusion that a regular varchar is probably the best solution.

    There are two reasons.

    • Postgres (and presumably many other SQL databases) support fixed length varchar datatypes. This means for an md5 you can set the length to be 32 characters (16 bytes due to hexadecimal encoding). Something which doesn't appear to be possible with Postgres bytea datatype. (The situation might be different for other SQL databases.)
    • The values will be human readable. Rather than printing garbled nonsense when displaying binary data, hex encoded ascii is at least somewhat human readable

  2. For storing MD5 and SHA-256 hashes in SQLAlchemy and PostgreSQL, the most suitable solution is to use the BYTEA type (which is represented as LargeBinary in SQLAlchemy), but with a length constraint. Although BYTEA itself is a variable-length type, PostgreSQL allows imposing a length restriction on the column.

    from sqlalchemy import create_engine, Column, Integer, LargeBinary
    from sqlalchemy.orm import declarative_base, sessionmaker
    
    Base = declarative_base()
    
    class MyTable(Base):
        __tablename__ = 'my_table'
    
        id = Column(Integer, primary_key=True)
        md5_hash = Column(LargeBinary)  # No length restriction needed for SQLite
        sha256_hash = Column(LargeBinary) # No length restriction needed for SQLite
    
    # Use an in-memory SQLite database for this example
    engine = create_engine('sqlite:///:memory:')
    
    Base.metadata.create_all(engine)
    
    Session = sessionmaker(bind=engine)
    session = Session()
    
    # Example usage
    import hashlib
    
    input_string = "test string"
    md5_hash = hashlib.md5(input_string.encode('utf-8')).digest()
    sha256_hash = hashlib.sha256(input_string.encode('utf-8')).digest()
    
    new_record = MyTable(md5_hash=md5_hash, sha256_hash=sha256_hash)
    session.add(new_record)
    session.commit()
    
    # Querying the data back (demonstration)
    retrieved_record = session.query(MyTable).first()
    print(f"Retrieved MD5 Hash: {retrieved_record.md5_hash.hex()}")
    print(f"Retrieved SHA256 Hash: {retrieved_record.sha256_hash.hex()}")
    
    
    session.close()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search