For my database I want natural (‘A2′<‘A12’), case-insensitive sorting, so for testing I created a new collation with
CREATE COLLATION tomscollation (provider = icu, locale = 'de-u-kn-true-ks-level2');
My end goal is to use such a collation as the default collation for the whole database so that every text column or index uses it by default.
This source notes that for PostgreSQL 15 non-deterministic collations are not yet supported at database level.
This is where my confusion starts. Looking at my collation the database says its deterministic (see screenshot). However the documentation says that only byte-order collations can be deterministic but mine is clearly not byte-order.
So what am I missing here?
Why am I able to create the following database and what are the side effects I might be unaware of?
CREATE DATABASE tomsdb TEMPLATE template0 LOCALE_PROVIDER icu ICU_LOCALE 'de-u-kn-true-ks-level2'
2
Answers
kn=true
means that Unicode Locale Extension numeric ordering will be default."non-deterministic collations are not yet supported at database level." means that by default,
select 'a' = 'A';
will return false.it return:
so on a database level, kn=true did work as expected. but
ks-level2
should by default makeselect 'a1' = 'A1'
return true.so you need create collation to do case insensitive sorting, like following way:
de-u-kn-true-ks-level2
is by definition not deterministic. If you use a non-deterministic collation in a context where you need a deterministic one (like as a database collation, or if you defined the collation asDETERMINISTIC = TRUE
), PostgreSQL will break the tie by usingmemcmp
if the two strings compare equal, but are not identical.That means for example that if you use that collation for a database, you will get
'A1' < 'a1'
, becausememcmp
ranks the lower ASCII value ofA
before the higher one ofa
.See the code of
varstr_cmp()
insrc/backend/utils/adt/varlena.c
:Since you asked in your comment: if you mark a collation as
DETERMINISTIC
, strings that are byte-wise different will never compare as equal.