I want to have a case-insensitive unique key on a large table.
I’m looking for the fastest approach.
I found two options: my_table_a
with a generated column and my_table_b
with a unique index
CREATE TABLE my_table_a
(
my_table_id_a int GENERATED ALWAYS AS IDENTITY
CONSTRAINT pk_my_table_a PRIMARY KEY,
name text NOT NULL,
key text GENERATED ALWAYS AS (LOWER(name)::text) STORED NOT NULL
);
CREATE TABLE my_table_b
(
my_table_id_b int GENERATED ALWAYS AS IDENTITY
CONSTRAINT pk_my_table_b PRIMARY KEY,
name text NOT NULL
);
CREATE UNIQUE INDEX unique_name ON my_table_b (LOWER(name));
But I’m not sure what is happening on an insert.
I fear that for a single insert, it is required to calculate for all rows the unique key.
What do you think is the fastest solution?
2
Answers
In the second case, you have an index on the lowercased name. This means that the lowercase names are already calculated and stored in a data structure that can be searched quickly (the index). When you insert a new name, its lowercase is computed, and then searched in this index against the recalculated lowercase names of the preexisting records.
In the first case, you only have a generated column with no promise of uniqueness. You could add a unique constraint to it, which would make it pretty similar to the second solution.
As demonstrated in a similar thread here, it’ll be even faster if you use the
lower()
expression-based index, but also make it useC
collation:Here you can find a few benchmarks comparing
citext
,lower()
expression based index with default collation, a plain index using a custom case-insensitive collation as well ascitext
with aC
collation.You can tweak it or run your own tests but in that one, the example above won.
If you wanted to speed anything up but generating the column, that’s not necessary – it will only duplicate the work and take more space.
You can set up
unique
as aconstraint
or as anindex
, but unique constraints are technically always enforced by an implicitly created index. When you add new values, Postgres tries to save them to the table as well as ingest them into the index by searching for a spot in it – if it finds that its occupied, depending on currently configured deferrability it’ll raise an exception. You don’t have to fear it’ll ever try to seq scan the table to check that, all of this logic takes place in the index.Indexes, indexes later added to the table as constraints and constraints specified on the table directly have slightly different syntax, behaviour and limitations, e.g.: the constraint doesn’t let you pick the collation. Hence, in this case, your best bet is the expression index with a collation, directly on the
name
column.