skip to Main Content

The Issue:

I am trying to select from multiple tables, Table A and B but this is mainly regarding the Only column I am choosing from Table B. The values in this column contain an Array of string.

Aim is to get get rid of any [] and then look at string. There are two main types of strings, one that contains only text, and are NOT comma separated, and others that has some combinations of text and comma.

What is Required:

For condition where string does not contain any commas then simply return the text value. For other cases where string does contain single or multiple commas return the values after the first comma. In cases where there are more than 2 or more commas then need to return string after the first comma and before 2nd comma.

Please look at the code for further explanation:

DROP TABLE IF EXISTS

[dbo].[multi_string_db]

GO

CREATE TABLE [dbo].[multi_string_db] (

multi_string nvarchar (256)  NULL

)

INSERT INTO [dbo].[multi_string_db] ( multi_string)

VALUES ('[Additional time required, Time requested]')

, ('[Additional time required, Document requested]')

, ('[Additional time required, Missing documents - Personal, Other]')

, ('[Additional time required, Missing documents - Personal]')

, ('Additional time required')

, ('Document Requested')

, ('Missing FPA/evidence')

, ('Missing documents - Office')

, ('Missing documents - Personal')

, ('Other')

, ('Referred to Decision Maker Team')

, ('Target date error')

Desired Results:

Desired_Output
Time requested
Document requested
Missing documents – Personal
Missing documents – Personal
Additional time required
Document Requested
Missing FPA/evidence
Missing documents – Office
Missing documents – Personal
Other
Referred to Decision Maker Team
Target date error

**What I have tried so Far: **

SELECT 
    LTRIM(RTRIM(
        CASE 
            WHEN CHARINDEX('[', multi_string) > 0 AND CHARINDEX(']', multi_string) > 0
            THEN 
                CASE 
                    WHEN CHARINDEX(',', multi_string) > 0 AND CHARINDEX(',', multi_string, CHARINDEX(',', multi_string) + 1) > 0
                    THEN SUBSTRING(multi_string, CHARINDEX(',', multi_string) + 1, CHARINDEX(',', multi_string, CHARINDEX(',', multi_string) + 1) - CHARINDEX(',', multi_string) - 1)
                    WHEN CHARINDEX(',', multi_string) > 0
                    THEN SUBSTRING(multi_string, CHARINDEX(',', multi_string) + 1, CHARINDEX(']', multi_string) - CHARINDEX(',', multi_string) - 1)
                    ELSE SUBSTRING(multi_string, CHARINDEX('[', multi_string) + 1, CHARINDEX(']', multi_string) - CHARINDEX('[', multi_string) - 1)
                END
            ELSE
                LTRIM(RTRIM(multi_string)) -- If no brackets, return the original string
        END
    )) AS Result
            FROM dbo.[multi_string_db]

The Issue (Limitations):

The logic works fine but only when [ ] are present, which definitely is the case with current data, however if there were new data then this logic will fail.

For Example:
If we have following string ‘Some time required, Received documents – Unchecked, Misc’ then the logic fails due to absence of [ ] around the string.

I hope this makes sense and wonder if someone can help me overcome this limitation.

2

Answers


  1. Please try the following solution based on tokenization via XML and XQuery. It will work starting from SQL Server 2017 onwards.

    No need to do any string parsing via endless CHARINDEX(), SUBSTRING(), and other functions.

    Notable points:

    • 1st CROSS APPLY is tokenizing a string via XML.
    • 2nd CROSS APPLY is counting # of tokens in a string of tokens separated by comma.
    • SELECT clause has a simple CASE statement to retrieve a correct token in a string of tokens based on token count.

    SQL

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, multi_string NVARCHAR(256));
    INSERT INTO @tbl( multi_string) VALUES 
    ('[Additional time required, Time requested]'), 
    ('[Additional time required, Document requested]'),
    ('[Additional time required, Missing documents - Personal, Other]'),
    ('[Additional time required, Missing documents - Personal]'),
    ('Additional time required'),
    ('Document Requested'),
    ('Missing FPA/evidence'),
    ('Missing documents - Office'),
    ('Missing documents - Personal'),
    ('Other'),
    ('Referred to Decision Maker Team'),
    ('Target date error');
    -- DDL and sample data population, end
    
    DECLARE @separator CHAR(1) = ',';
    
    SELECT ID, t.multi_string --, c, t2.token_count -- uncomment to see XML and the token count
        , result = CASE
            WHEN t2.token_count = 1 THEN TRIM(c.value('(/root/r/text())[1]', 'NVARCHAR(100)'))
            WHEN t2.token_count > 1 THEN TRIM(c.value('(/root/r[2]/text())[1]', 'NVARCHAR(100)'))
            ELSE 'Alarm! Some edge case.'
        END 
    FROM @tbl AS t
    CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
        REPLACE(TRIM('[]' FROM multi_string), @separator, ']]></r><r><![CDATA[') + 
        ']]></r></root>' AS XML)) AS t1(c)
    CROSS APPLY (SELECT c.value('count(/root/r)', 'INT')) AS t2(token_count);
    

    Output

    multi_string result
    [Additional time required, Time requested] Time requested
    [Additional time required, Document requested] Document requested
    [Additional time required, Missing documents – Personal, Other] Missing documents – Personal
    [Additional time required, Missing documents – Personal] Missing documents – Personal
    Additional time required Additional time required
    Document Requested Document Requested
    Missing FPA/evidence Missing FPA/evidence
    Missing documents – Office Missing documents – Office
    Missing documents – Personal Missing documents – Personal
    Other Other
    Referred to Decision Maker Team Referred to Decision Maker Team
    Target date error Target date error
    Login or Signup to reply.
  2. I’ve added in your ‘problem’ data in this example

    DROP TABLE IF EXISTS #multi_string_db
    
    CREATE TABLE #multi_string_db (
        multi_string nvarchar (256)  NULL
    )
    INSERT INTO #multi_string_db ( multi_string)
    VALUES ('[Additional time required, Time requested]')
    , ('[Additional time required, Document requested]')
    , ('[Additional time required, Missing documents - Personal, Other]')
    , ('[Additional time required, Missing documents - Personal]')
    , ('Additional time required')
    , ('Document Requested')
    , ('Missing FPA/evidence')
    , ('Missing documents - Office')
    , ('Missing documents - Personal')
    , ('Other')
    , ('Referred to Decision Maker Team')
    , ('Target date error')
    , ('Some time required, Received documents - Unchecked, Misc')
    
    
    SELECT 
            multi_string, 
            Result = 
                    REPLACE(REPLACE(
                    LTRIM(RTRIM(
                    SUBSTRING(  multi_string
                            , CHARINDEX(',', multi_string)+1
                            ,  IIF(CHARINDEX(',', multi_string, CHARINDEX(',', multi_string)+1) = 0
                                                , LEN(multi_string) -  CHARINDEX(',', multi_string)
                                                , CHARINDEX(',', multi_string, CHARINDEX(',', multi_string)+1) - CHARINDEX(',', multi_string) -1
                                                )
                            )
                    ))
                    , '[', '')
                    , ']', '')
        FROM #multi_string_db
    

    If ignores the [ and ] to start with an d just finds the parts required for the result, then trim the result of that and replace any remaining [ and ].

    If you are using SQL Server 2022 you could use string_split() with the ordinal option (I think) and just extract the 1st or 2nd ordinal. It would be really simple then.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search