I want to sort test last name
and test2 last name
so that the former comes before the latter. My understanding is that each character is compared from left to right until they differ and therefore the characters after those first differing characters do not matter anymore. However, as shown below, test
comes before test2
but as soon as I append another character, the order changes. Why does this happen? What collation should I use to get the desired order? Note that converting them to bytea
would yield the desired order.
test=# SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test last' < 'test2 last' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test l' < 'test2 l' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test l' < 'test2l' COLLATE "en_US";
?column?
----------
f
(1 row)
test=# SELECT 'test ' < 'test2l' COLLATE "en_US";
?column?
----------
t
(1 row)
test=# SELECT 'test last name'::bytea < 'test2 last name'::bytea;
?column?
----------
t
(1 row)
2
Answers
That’s how natural language collations work. If you want to compare character by character and have the space character be like other characters, use the C collation:
But don’t complain if
'Z' < 'a'
…white space is special character in ICU collation.
see demo: https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples
also here: http://www.unicode.org/reports/tr35/tr35-collation.html#table-collation-settings
simple explanation: https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html#shift-trimmed
You can following test:
if you just want compare by code pointer, you can use COLLATE "C" or COLLATE "POSIX".