skip to Main Content

I am trying to sort a column title from the table door with the following PostgreSQL query:

SELECT title
FROM door
ORDER BY
  CASE 
    WHEN title ~ '^d+' THEN CAST(regexp_replace(title, '[^0-9]', '', 'g') AS INTEGER)
    ELSE NULL
  END ASC,
  CASE 
    WHEN title ~ '^d+' THEN regexp_replace(title, 'd+', '', 'g')
    ELSE title
  END ASC;

The query currently produces the following result:

"DR-1"
"DR-01"
"DR-02"
"DR-03"
"DR-04"
"DR-100"
"DR-1001"
"DR-101"
"DR-102"
"DR-104"
"Entrance-1"
"Entrance-2"
"MY-Prefix-10-Entrance-1"
"MY-Prefix-11-Entrance-1"
"MY-Prefix-19-Entrance-1"
"MY-Prefix-1-Entrance-1"
"MY-Prefix-20-Entrance-1"
"MY-Prefix-2-Entrance-1"
"MY-Prefix-3-Entrance-1"
"MY-Prefix-8-Entrance-1"
"MY-Prefix-9-Entrance-1"
"MY-Prefix-9-Entrance-2"
"MY-Prefix-9-Entrance-3"

However, I was expecting the result to be:

"DR-1"
"DR-01"
"DR-02"
"DR-03"
"DR-04"
"DR-100"
"DR-101"
"DR-102"
"DR-104"
"DR-1001"
"Entrance-1"
"Entrance-2"
"MY-Prefix-1-Entrance-1"
"MY-Prefix-2-Entrance-1"
"MY-Prefix-3-Entrance-1"
"MY-Prefix-8-Entrance-1"
"MY-Prefix-9-Entrance-1"
"MY-Prefix-9-Entrance-2"
"MY-Prefix-9-Entrance-3"
"MY-Prefix-10-Entrance-1"
"MY-Prefix-11-Entrance-1"
"MY-Prefix-19-Entrance-1"
"MY-Prefix-20-Entrance-1"

The query is not correctly ordering the rows where the title column contains mixed prefixes and numeric segments, particularly the entries with the MY-Prefix pattern. I suspect this is due to the logic in the ORDER BY clause not properly handling the sorting of multi-segment alphanumeric strings.

I want to:

Sort rows where the numeric segment is extracted from title and ordered numerically.
Handle alphanumeric prefixes correctly so that entries like MY-Prefix-1-Entrance-1 appear before MY-Prefix-10-Entrance-1.

2

Answers


  1. The following query uses REXEXP_REPLACE to extract the segments from title:

    WITH
      door (title) AS (
        VALUES
          ('DR-1'),
          ('DR-01'),
          ('DR-02'),
          ('DR-03'),
          ('DR-04'),
          ('DR-100'),
          ('DR-1001'),
          ('DR-101'),
          ('DR-102'),
          ('DR-104'),
          ('Entrance-1'),
          ('Entrance-2'),
          ('MY-Prefix-10-Entrance-1'),
          ('MY-Prefix-11-Entrance-1'),
          ('MY-Prefix-19-Entrance-1'),
          ('MY-Prefix-1-Entrance-1'),
          ('MY-Prefix-20-Entrance-1'),
          ('MY-Prefix-2-Entrance-1'),
          ('MY-Prefix-3-Entrance-1'),
          ('MY-Prefix-8-Entrance-1'),
          ('MY-Prefix-9-Entrance-1'),
          ('MY-Prefix-9-Entrance-2'),
          ('MY-Prefix-9-Entrance-3')
      )
    SELECT
      title
    FROM
      door
    ORDER BY
      REGEXP_REPLACE(title, '^(D*).*', '1') NULLS FIRST,
      NULLIF(REGEXP_REPLACE(title, '^D*(d*).*', '1'), '')::INTEGER NULLS FIRST,
      REGEXP_REPLACE(title, '^D*d*(D*).*', '1') NULLS FIRST,
      NULLIF(REGEXP_REPLACE(title, '^D*d*D*(d*).*', '1'), '')::INTEGER NULLS FIRST,
      title DESC;
    

    NULLIF is used to address missing numeric segments that cause exceptions when attempting to cast an empty string to an integer. Ordering by title DESC at the end of the list ensures that titles with leading zeros in numerically equivalent segments will sort based on the number of leading zeros.

    Login or Signup to reply.
  2. That’s one of the purposes of custom ICU collations. You can define a sorting method that treats numbers numerically, the way you wanted: demo at db<>fiddle

    CREATE COLLATION numerical( provider=icu
                               ,deterministic=false
                               ,locale='und-u-kn' );
    

    Then you just tell your order by to use it:

    SELECT title
    FROM door
    ORDER BY title COLLATE numerical;
    

    The demo shows it’s also faster than trying to handle that using regex and conditionals.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search