PostgreSQL sort order affected by succeeding characters

ChristianAlis
January 30, 2023
207 views
0 votes
2 Answers

I want to sort test last name and test2 last name so that the former comes before the latter. My understanding is that each character is compared from left to right until they differ and therefore the characters after those first differing characters do not matter anymore. However, as shown below, test comes before test2 but as soon as I append another character, the order changes. Why does this happen? What collation should I use to get the desired order? Note that converting them to bytea would yield the desired order.

test=# SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test last' < 'test2 last' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test ' < 'test2 ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test ' < 'test2' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test' < 'test2' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2 ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2  ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2 ' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test  ' < 'test2 l' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test ' < 'test2 l' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test l' < 'test2 l' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test l' < 'test2l' COLLATE "en_US";
 ?column?
----------
 f
(1 row)

test=# SELECT 'test ' < 'test2l' COLLATE "en_US";
 ?column?
----------
 t
(1 row)

test=# SELECT 'test last name'::bytea < 'test2 last name'::bytea;
 ?column? 
----------
 t
(1 row)

Tags: collation postgresql

Answers

- LaurenzAlbe
- January 30, 2023 at 6:24 am
- 0 votes
0
That’s how natural language collations work. If you want to compare character by character and have the space character be like other characters, use the C collation:
```
SELECT 'test last name' < 'test2 last name' COLLATE "C";

 ?column? 
══════════
 t
(1 row)
```
But don’t complain if 'Z' < 'a' …
Login or Signup to reply.

- jian
- January 30, 2023 at 8:05 am
- 0 votes
0
white space is special character in ICU collation.

see demo: https://www.unicode.org/reports/tr10/#Variable_Weighting_Examples
also here: http://www.unicode.org/reports/tr35/tr35-collation.html#table-collation-settings
simple explanation: https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html#shift-trimmed

You can following test:
```
CREATE COLLATION coll_shifted(provider = icu, locale = 'en-u-ka-shifted');
CREATE COLLATION coll_noignore(provider = icu, locale = 'en-u-ka-noignore');

SELECT 'test last name' < 'test2 last name' COLLATE coll_shifted
union all
SELECT 'test last name' < 'test2 last name' COLLATE coll_noignore
union all
SELECT 'test last name' < 'test2 last name' COLLATE "en_US";
```
if you just want compare by code pointer, you can use COLLATE "C" or COLLATE "POSIX".
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.