skip to Main Content

In a simple table in PostgreSQL

CREATE TABLE "SOMESCHEMA".newtable (
    col1 varchar NULL
);

I inserted three rows with the values

"A 1"
"A~2"
"A 3"

Whe I make a query like

SELECT col1 FROM "SOMESCHEMA".newtable order by col1 desc;

the results are ordered in the following way:

"A 3"
"A~2"
"A 1"

Why isn’t A~2 be placed on top or bottom ?
Do I have to change a setting ? The collation of the column is en_US.UTF-8

2

Answers


  1. In ASCII table, the tilde character is represented by the decimal code 126, which is higher than the decimal codes of digits and capital/lowercase letters. However, when PostgreSQL uses UTF-8 for collation, it treats the tilde character differently in order to conform to linguistic rules.

    The tilde character is not generally used in English language text, so in many UTF-8 collations it’s treated as a secondary or tertiary difference. In other words, it’s less significant than a difference in letters or even accents on letters.

    Login or Signup to reply.
  2. I think with en_US.UTF-8 collation, which is based on the English language rules, the ordering is generally lexicographic where special characters, such as spaces and symbols, are considered before alphanumeric characters.
    In your case, the provided values"A 3", "A~2", and "A 1" are ordered as follows:

    1. "A 3" – The space character is considered before the tilde (~) symbol and alphanumeric characters.
    2. "A~2" – The tilde (~) symbol is considered after the space character but before alphanumeric characters.
    3. "A 1" – The space character is considered before alphanumeric characters.

    If you want to sort ignoring spaces and symbols, you may need to consider using a collation like en_US.UTF-8@icu, which is based on the International Components for Unicode (ICU) library.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search