skip to Main Content

I am using SQLAlchemy and PostgreSQL, and I am trying to create a case-insensitive unique constraint that works like this

        UniqueConstraint(
            'country',
            'organisation_id',
            func.lower(func.trim('name')),
            name='uq_country_orgid_lower_trim_name'
        )

Ensuring a unique combination of name, country and organisation id, regardless of case and spacing in the name, i.e. "Name 1", "name1", "nAmE 1" would all be handled as "name1" in the check.

I want to make sure that I do not change the actual case or spacing of the name saved in the database.

How would I go about this?

3

Answers


  1. You can use the text construct to declare an expression within an index declaration, like text("""lower(replace(name, ' ', ''))""").

    An example script would look like this:

    import contextlib
    
    import sqlalchemy as sa
    from sqlalchemy.exc import IntegrityError
    
    metadata = sa.MetaData()
    tbl = sa.Table(
        't79227338',
        metadata,
        sa.Column('country', sa.String, nullable=False),
        sa.Column('organisation', sa.String, nullable=False),
        sa.Column('name', sa.String, nullable=False),
        sa.Index(
            'uq_country_orgid_lower_trim_name',
            'country',
            'organisation',
            sa.text("""lower(replace(name, ' ', ''))"""),
            unique=True,
        ),
    )
    
    engine = sa.create_engine('postgresql+psycopg2:///so', echo=True)
    metadata.drop_all(engine)
    metadata.create_all(engine)
    
    names = ['Name 1', 'name1', 'nAmE 1', ' naME 1 ']
    values = [
        {'country': 'Ruritania', 'organisation': 'Palace Guard', 'name': n} for n in names
    ]
    
    for vals in values:
        with contextlib.suppress(IntegrityError):
            with engine.begin() as conn:
                conn.execute(tbl.insert().values(**vals))
    
    with engine.connect() as conn:
        rows = conn.execute(sa.select(tbl)).all()
    
        assert len(rows) == 1
        assert rows[0].name == 'Name 1'
    

    This is the generated index DDL:

    CREATE UNIQUE INDEX uq_country_orgid_lower_trim_name 
      ON t79227338 (country, organisation, lower(replace(name, ' ', '')))
    

    Using the information from this answer we can see that the index contains this data:

    b'x15Ruritaniax1bPalace Guardrname1x00x00x00'
    

    showing that the indexed name is indeed lower-cased and free of spaces.


    Resources

    Login or Signup to reply.
  2. I tend to avoid putting scripting this type requirement in SQLAlchemy (or any other obscurification [ORM] language). You cannot guarantee access will always be through that script. Rather direct ddl. One possibility is creating a generated column and a SQL function to create the single column for you index. This way you enforce the requirement at the lowest possible level thus ensuring it will always be employed. So, what is needed:

    • Define a function to create the value for the column.
    • Define a column as generated always as.
    • Define a unique index on the column.

    As an example:

    -- Assume an existing table defined as:  
    create table test( id     integer generated always as identity
                                      primary key 
                    , name    text 
                    , country text
                    , org_id  integer
                    );
    
    -- Now Define a function to create the value for the unique column/index.
    create or replace function construct_ncoi( name_in    text
                                             , country_in text 
                                             , org_id_in  integer
                                             ) 
       returns text
       language sql 
       immutable 
    as $$
       select lower(replace( concat(name_in, country_in, org_id_in::integer),' ','') );
    $$; 
    
    -- Create the generated column and define a unique index on it.
    alter table test
        add ncoi    text generated always as ( construct_ncoi(name, country, org_id) ) stored;
    alter table test
       add  constraint unique_noci unique (ncoi);
    

    With the above your constraint is always enforced even if another team shares the data but chooses another ORM or the DBA attempts an emergency change via direct SQL. See Example Here.
    Yes, it is somewhat longer and makes the tradeoff of some storage for the requirement of never being bypassed. (Well a superuser can get around it, but that is always possible.)

    Login or Signup to reply.
  3. This is mostly the ORM version of @snakecharmerb and avoids the use of text (nothing wrong with sanitized text, it is just a preference). This answer suggests using a unique index, instead of unique constraint because SQLA lacks support.

    from sqlalchemy import create_engine, func, Index, select, column
    from sqlalchemy.exc import IntegrityError
    from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, Session
    
    class Base(DeclarativeBase): ...
    
    class SomeTable(Base):
        __tablename__ = "some_table"
        __table_args__ = (
            Index(
                "uq_country_orgid_lower_trim_name",
                "country",
                "organization_id",
                func.lower(func.replace(column("name"), " ", "")),
                unique=True,
            ),
        )
        name: Mapped[str]
        id: Mapped[int] = mapped_column(primary_key=True)
        country: Mapped[str]
        organization_id: Mapped[int]
    
    engine = create_engine("postgresql+psycopg://")
    Base.metadata.create_all(engine)
    
    with Session(engine) as session:
        session.add(SomeTable(country="C1", organization_id=1, name="NAME 1"))
        session.commit()
    
    with Session(engine) as session:
        rows = (
            SomeTable(country="C1", organization_id=1, name="NAME1"),
            SomeTable(country="C1", organization_id=1, name="Name 1"),
            SomeTable(country="C1", organization_id=1, name="name1"),
            SomeTable(country="C1", organization_id=1, name="nAmE 1"),
        )
        for row in rows:
            try:
                with session.begin_nested():
                    session.add(row)
            except IntegrityError:
                print("failed as expected")
    
    with Session(engine) as session:
        row = session.scalar(select(SomeTable))
        assert row
        assert row.name == "NAME 1"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search