skip to Main Content

I need to find a pattern which will give me a substring which follows logic to substring from the right side, after the third _.

What I want to achieve is to retrieve everything up until 3rd _ .

I tried this SQL, but it did not work

select 
    SUBSTRING('22D_XYZ_xy_100_xyz_123', 1, position('_' in REVERSE('22D_XYZ_xy_100_xyz_123')) - 1) as result;

It returns the 22D whereas I expect 22D_XYZ_xy

2

Answers


  1. You could try this query

    SELECT 
        REGEXP_SUBSTR('22D_XYZ_xy_100_xyz_123', '^[^_]+_[^_]+_[^_]+') as result;
    

    See demo here

    Login or Signup to reply.
  2. The phrase, "substring which follows logic to substring from the right side, after the third _," is difficult to understand. I infer that the intended behavior is to extract the subtring that includes all text before the third _ from the right and not all text before the third _ from the left. When searching from the left, there is also ambiguity as to expected behavior if there are less than three _ characters: should the entire string be returned or should the result be NULL? Unfortunately, the example input, 22D_XYZ_xy_100_xyz_123, and expected output, 22D_XYZ_xy, are consistent with all of these cases. The following query demonstrates all three interpretations:

    SELECT
      string,
      regexp_substr(string, '^.*(?=(_[^_]*){3}$)') as from_right,
      regexp_substr(string, '^([^_]*_){0,2}[^_]*') as from_left,
      regexp_substr(string, '^([^_]*_){2}[^_]*(?=_)') as from_left_requires_3
    FROM (
      VALUES ('22D_XYZ_xy_100_xyz_123'),
             ('1_2_3_4_5_6_7'),
             ('1_2_3_4_5_6'),
             ('1_2_3_4_5_'),
             ('_1_2_3_4_5'),
             ('1_2_3_4_5'),
             ('1_2_3_4'),
             ('1_2_3'),
             ('1_2_'),
             ('1_2'),
             ('1'),
             ('____'),
             ('___'),
             ('__'),
             ('_'),
             (''),
             (NULL)) t(string);
    

    Executing the query gives the following:

    string from_right from_left from_left_requires_3
    ’22D_XYZ_xy_100_xyz_123′ ’22D_XYZ_xy’ ’22D_XYZ_xy’ ’22D_XYZ_xy’
    ‘1_2_3_4_5_6_7’ ‘1_2_3_4’ ‘1_2_3’ ‘1_2_3’
    ‘1_2_3_4_5_6’ ‘1_2_3’ ‘1_2_3’ ‘1_2_3’
    ‘1_2_3_4_5_’ ‘1_2_3’ ‘1_2_3’ ‘1_2_3’
    ‘_1_2_3_4_5’ ‘_1_2’ ‘_1_2’ ‘_1_2’
    ‘1_2_3_4_5’ ‘1_2’ ‘1_2_3’ ‘1_2_3’
    ‘1_2_3_4’ ‘1’ ‘1_2_3’ ‘1_2_3’
    ‘1_2_3’ NULL ‘1_2_3’ NULL
    ‘1_2_’ NULL ‘1_2_’ NULL
    ‘1_2’ NULL ‘1_2’ NULL
    ‘1’ NULL ‘1’ NULL
    ‘____’ ‘_’ ‘__’ ‘__’
    ‘___’ ‘__’ ‘__’
    ‘__’ NULL ‘__’ NULL
    ‘_’ NULL ‘_’ NULL
    NULL NULL
    NULL NULL NULL NULL

    To search from the right, a positive lookahead zero-length assertion, (?=(_[^_]*){3}$), is used to check that the unmatched portion of the string begins with _ and includes exactly three _ characters. If the assertion is false, then NULL is returned.

    To find the left substring with no more than two _ characters, the regular expression matches up to two substings of zero or more non-_ characters followed by a single _ along with any non-_ characters prior to the next _ or the end of the string.

    Finding the left substring preceding a required third _ is similar to finding the left substring with no more than two _ characters, except that the occurrence specifier for the group is {2} instead of {0,2} and a positive lookahead zero-lengh assertion, (?=_) ensures that _ is the first unmatched character. As with the search from the right, if the assertion is false, then NULL is returned.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search