I am trying to sort a column title from the table door with the following PostgreSQL query:
SELECT title
FROM door
ORDER BY
CASE
WHEN title ~ '^d+' THEN CAST(regexp_replace(title, '[^0-9]', '', 'g') AS INTEGER)
ELSE NULL
END ASC,
CASE
WHEN title ~ '^d+' THEN regexp_replace(title, 'd+', '', 'g')
ELSE title
END ASC;
The query currently produces the following result:
"DR-1"
"DR-01"
"DR-02"
"DR-03"
"DR-04"
"DR-100"
"DR-1001"
"DR-101"
"DR-102"
"DR-104"
"Entrance-1"
"Entrance-2"
"MY-Prefix-10-Entrance-1"
"MY-Prefix-11-Entrance-1"
"MY-Prefix-19-Entrance-1"
"MY-Prefix-1-Entrance-1"
"MY-Prefix-20-Entrance-1"
"MY-Prefix-2-Entrance-1"
"MY-Prefix-3-Entrance-1"
"MY-Prefix-8-Entrance-1"
"MY-Prefix-9-Entrance-1"
"MY-Prefix-9-Entrance-2"
"MY-Prefix-9-Entrance-3"
However, I was expecting the result to be:
"DR-1"
"DR-01"
"DR-02"
"DR-03"
"DR-04"
"DR-100"
"DR-101"
"DR-102"
"DR-104"
"DR-1001"
"Entrance-1"
"Entrance-2"
"MY-Prefix-1-Entrance-1"
"MY-Prefix-2-Entrance-1"
"MY-Prefix-3-Entrance-1"
"MY-Prefix-8-Entrance-1"
"MY-Prefix-9-Entrance-1"
"MY-Prefix-9-Entrance-2"
"MY-Prefix-9-Entrance-3"
"MY-Prefix-10-Entrance-1"
"MY-Prefix-11-Entrance-1"
"MY-Prefix-19-Entrance-1"
"MY-Prefix-20-Entrance-1"
The query is not correctly ordering the rows where the title column contains mixed prefixes and numeric segments, particularly the entries with the MY-Prefix pattern. I suspect this is due to the logic in the ORDER BY clause not properly handling the sorting of multi-segment alphanumeric strings.
I want to:
Sort rows where the numeric segment is extracted from title
and ordered numerically.
Handle alphanumeric prefixes correctly so that entries like MY-Prefix-1-Entrance-1
appear before MY-Prefix-10-Entrance-1
.
2
Answers
The following query uses
REXEXP_REPLACE
to extract the segments fromtitle
:NULLIF
is used to address missing numeric segments that cause exceptions when attempting to cast an empty string to an integer. Ordering bytitle DESC
at the end of the list ensures that titles with leading zeros in numerically equivalent segments will sort based on the number of leading zeros.That’s one of the purposes of custom ICU collations. You can define a sorting method that treats numbers numerically, the way you wanted: demo at db<>fiddle
Then you just tell your
order by
to use it:The demo shows it’s also faster than trying to handle that using regex and conditionals.