In a simple table in PostgreSQL
CREATE TABLE "SOMESCHEMA".newtable (
col1 varchar NULL
);
I inserted three rows with the values
"A 1"
"A~2"
"A 3"
Whe I make a query like
SELECT col1 FROM "SOMESCHEMA".newtable order by col1 desc;
the results are ordered in the following way:
"A 3"
"A~2"
"A 1"
Why isn’t A~2
be placed on top or bottom ?
Do I have to change a setting ? The collation of the column is en_US.UTF-8
2
Answers
In ASCII table, the tilde character is represented by the decimal code 126, which is higher than the decimal codes of digits and capital/lowercase letters. However, when PostgreSQL uses UTF-8 for collation, it treats the tilde character differently in order to conform to linguistic rules.
The tilde character is not generally used in English language text, so in many UTF-8 collations it’s treated as a secondary or tertiary difference. In other words, it’s less significant than a difference in letters or even accents on letters.
I think with
en_US.UTF-8
collation, which is based on the English language rules, the ordering is generally lexicographic where special characters, such as spaces and symbols, are considered before alphanumeric characters.In your case, the provided values
"A 3"
,"A~2"
, and"A 1"
are ordered as follows:"A 3"
– The space character is considered before the tilde (~
) symbol and alphanumeric characters."A~2"
– The tilde (~
) symbol is considered after the space character but before alphanumeric characters."A 1"
– The space character is considered before alphanumeric characters.If you want to sort ignoring spaces and symbols, you may need to consider using a collation like
en_US.UTF-8@icu
, which is based on the International Components for Unicode (ICU) library.