skip to Main Content

This is based on a question previously asked which was deleted by the OP but it got me thinking and playing round with it, and I managed to do it without using a regex. The question is also a lesson in the perils of allowing users to enter free text!

However, my solution is quite cumbersome and I’ve wracked my brains to see if I could find a more elegant solution, but no joy.

So, I’m going to offer a bonus of 100 points to the person who can provide one. I’m not going to put my own solution forward yet.

A typical string might be like this:

'First order 437.3/10-87 16NY100013XX55 - Return'

So, the only thing you can be sure of is that the string (order_name) starts with '437.' and that it ends with the last digit in the string. The text following the order is arbitrary: its only constraint is that it can’t contain a digit!

So, the result for this particular example would be

'437.3/10-87 16NY100013XX55'

So, my table and data are below and also on the fiddle here:

CREATE TABLE t1 
(
  id INT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  order_name TEXT NOT NULL
);

strings:

INSERT INTO t1 (order_name) VALUES
('First order 437.8/03-87 22LA190028YV7ER55'),
('Second order 437.8-03-87 22LA19'),
('First order 437.3/10-87 16NY100013XX55 - Return'),
('Order 21.02.2022 437.8/10-87 16WA239766'),
('437.8/10-87 16NY10023456YY78 - Paid'),
('First order (437.8/03-87 22LA190028)'),
('Visit 02.02.2023 Order 437.5/10-87 16DC107765X56 REFUND'),
('Visit 02.02.2023 Order 437.5/10-87 16DC1077657FFR56REFUND'),    
('Visit 02.02.2023 Order 437.5/10-87 16DC107765745 - Reorder');

Desired result:

Order Text
437.8/03-87 22LA190028YV7ER55
437.8-03-87 22LA19
437.3/10-87 16NY100013XX55
437.8/10-87 16WA239766
437.8/10-87 16NY10023456YY78
437.8/03-87 22LA190028
437.5/10-87 16DC107765X56
437.5/10-87 16DC1077657FFR56
437.5/10-87 16DC107765745

The fiddle is PostgreSQL 16, but I’ve checked and it works back to version 10. The question is PostgreSQL based, but elegant solutions from other systems will also get an upvote.

Please let me know if any further information is required.

4

Answers


  1. EDIT: Originally I was looking for the first digit after the word "Order "because I missed the part where the Order starts with "437". I’ve updated the answer but left the slightly more complex logic of finding the start position of the search

    This is a TSQL solution

    • I want to find the first and last digit positions within the string.
    • I’ll search for the first position either after the occurrence of the word "437" or the start of the string in its absence.
    • The last
      position is calculated by reversing the string
    • I’m using CROSS APPLY with VALUES so I can refer to the previous calculations without repeating the code (for readability)

    The only tricky thing should be off-by-one errors

    SELECT 
        *
    FROM dbo.t1 AS t
    CROSS APPLY (values ( COALESCE(NULLIF(PATINDEX('%437%', t.order_name), 0), 1))) AS ca1(start_pos) /* start search from beginning or first "order" */
    CROSS APPLY (values (
        ca1.start_pos + PATINDEX('%[0-9]%', SUBSTRING(t.order_name, ca1.start_pos, LEN(t.order_name))) -1 /* find first digit from start_pos */
        , LEN(t.order_name) - PATINDEX('%[0-9]%', REVERSE(t.order_name)) +1 /* find first digit in reversed string */
        )) AS ca2(first_digit_pos, last_digit_pos)
    CROSS APPLY (values ( SUBSTRING(t.order_name, ca2.first_digit_pos, ca2.last_digit_pos-ca2.first_digit_pos +1))) AS ca3(isolated_order)
    

    Result
    enter image description here

    Login or Signup to reply.
  2. In order to extract string without using regular expressions, I would use string functions like POSITION, and SUBSTRING. I have two solutions:

    Solution 1:

    WITH ExtractedOrders AS (
      SELECT
        SUBSTRING(order_name FROM POSITION('437.' IN order_name)) AS raw_order_text
      FROM t1
    )
    SELECT
      TRIM(
        BOTH ' - ' FROM
        CASE
          WHEN raw_order_text LIKE '%Paid' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Paid'))
          WHEN raw_order_text LIKE '%REFUND' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Refund'))
          WHEN raw_order_text LIKE '%Reorder' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Reorder'))
          WHEN raw_order_text LIKE '%Return' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Return'))
          WHEN raw_order_text LIKE '%)' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH(')'))
          ELSE raw_order_text
        END
      ) AS "Order Text"
    FROM ExtractedOrders;
    

    Solution 2:

    WITH ExtractedOrders AS (
      SELECT
        id,
        SUBSTRING(order_name FROM POSITION('437.' IN order_name)) AS raw_order_text
      FROM t1
    )
    SELECT
      TRIM(BOTH ' -' FROM cleaned_order_text) AS "Order Text"
    FROM (
      SELECT
        id,
        TRIM(BOTH ' -' FROM
          REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  REPLACE(raw_order_text, 'Paid', ''), 
                  'REFUND', ''
                ), 
                'Reorder', ''
              ), 
              'Return', ''
            ), 
            ')', ''
          )
        ) AS cleaned_order_text
      FROM ExtractedOrders
    ) AS cleaned_orders;
    

    Hope my solutions are aligned with yours, and this is what you were looking for.

    Login or Signup to reply.
  3. Find the positions of ‘437.’ and the last digit in the order_name string (with a scalar subquery) to use with substring function.

    select substring(
      order_name, 
      position('437.' in order_name), 
      (
       select max(o)::integer 
       from string_to_table(order_name, null) with ordinality as t(c, o)
       where c between '0' and '9'
      ) - position('437.' in order_name) + 1
    ) from t1;
    

    Fiddle

    Login or Signup to reply.
  4. Just selecting the substring starting at the position of 437., and ending on the second space after the start of that:

    SELECT
      order_name,
      substring(attempt1,1,case when p2=0 then 500 else p1+p2 end) as substring
    FROM (
      SELECT 
        order_name,
        attempt1,
        position(' ' in attempt1) as p1,
        position(' ' in substring(attempt1, 1+position(' ' in attempt1))) as p2
      FROM (
        SELECT
          order_name, 
          substring(order_name,position('437.' IN order_name)) as attempt1
        FROM t1
      )
    )
    

    output:

    order_name substring
    First order 437.8/03-87 22LA190028YV7ER55 437.8/03-87 22LA190028YV7ER55
    Second order 437.8-03-87 22LA19 437.8-03-87 22LA19
    First order 437.3/10-87 16NY100013XX55 – Return 437.3/10-87 16NY100013XX55
    Order 21.02.2022 437.8/10-87 16WA239766 437.8/10-87 16WA239766
    437.8/10-87 16NY10023456YY78 – Paid 437.8/10-87 16NY10023456YY78
    First order (437.8/03-87 22LA190028) 437.8/03-87 22LA190028)
    Visit 02.02.2023 Order 437.5/10-87 16DC107765X56 REFUND v437.5/10-87 16DC107765X56
    Visit 02.02.2023 Order 437.5/10-87 16DC1077657FFR56REFUND 437.5/10-87 16DC1077657FFR56REFUND
    Visit 02.02.2023 Order 437.5/10-87 16DC107765745 – Reorder 437.5/10-87 16DC107765745

    see: DBFIDDLE

    EDIT: Removing the REFUND, or the ), using regular expressions is easy, just to:

    regexp_replace(substring(attempt1,1,case when p2=0 then 500 else p1+p2 end),'[^0-9]*$','','g') as substring2
    

    When doing this without regular expression, might need a user defined function, especially when you do not want to hard code stuff like REFUND or REORDER

    see: DBFIDDLE

    EDIT2: To Remove letters, and other non-numeric characters, from the right side of the string, I created this function:

    CREATE OR REPLACE FUNCTION public.trimlettersfromrightofstring(s varchar(200))
        RETURNS varchar(200)
        LANGUAGE sql
    AS $function$
    with recursive abc as (
          SELECT s as a
        ), abc2(a,b,c) as (
          select 
             a,
             rtrim(a,right(trim(a),1)) as b, 
             1 as c 
          from abc
          
          union all
          
          select 
             b,
             rtrim(trim(b),right(trim(b),1)), 
             c+1
          from abc2 
          where right(trim(b),1) <'0' or right(trim(b),1) > '9'  
        )
        SELECT b from abc2 order by c desc limit 1;
    $function$
    

    This can be used as SELECT trimlettersfromrightofstring('1234abcdef5hijk');, which returns 1234abcdef5

    This is the link to the adapted DBFIDDLE, which produced the desired results.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search