The Issue:
I am trying to select from multiple tables, Table A and B but this is mainly regarding the Only column I am choosing from Table B. The values in this column contain an Array of string.
Aim is to get get rid of any [] and then look at string. There are two main types of strings, one that contains only text, and are NOT comma separated, and others that has some combinations of text and comma.
What is Required:
For condition where string does not contain any commas then simply return the text value. For other cases where string does contain single or multiple commas return the values after the first comma. In cases where there are more than 2 or more commas then need to return string after the first comma and before 2nd comma.
Please look at the code for further explanation:
DROP TABLE IF EXISTS
[dbo].[multi_string_db]
GO
CREATE TABLE [dbo].[multi_string_db] (
multi_string nvarchar (256) NULL
)
INSERT INTO [dbo].[multi_string_db] ( multi_string)
VALUES ('[Additional time required, Time requested]')
, ('[Additional time required, Document requested]')
, ('[Additional time required, Missing documents - Personal, Other]')
, ('[Additional time required, Missing documents - Personal]')
, ('Additional time required')
, ('Document Requested')
, ('Missing FPA/evidence')
, ('Missing documents - Office')
, ('Missing documents - Personal')
, ('Other')
, ('Referred to Decision Maker Team')
, ('Target date error')
Desired Results:
Desired_Output |
---|
Time requested |
Document requested |
Missing documents – Personal |
Missing documents – Personal |
Additional time required |
Document Requested |
Missing FPA/evidence |
Missing documents – Office |
Missing documents – Personal |
Other |
Referred to Decision Maker Team |
Target date error |
**What I have tried so Far: **
SELECT
LTRIM(RTRIM(
CASE
WHEN CHARINDEX('[', multi_string) > 0 AND CHARINDEX(']', multi_string) > 0
THEN
CASE
WHEN CHARINDEX(',', multi_string) > 0 AND CHARINDEX(',', multi_string, CHARINDEX(',', multi_string) + 1) > 0
THEN SUBSTRING(multi_string, CHARINDEX(',', multi_string) + 1, CHARINDEX(',', multi_string, CHARINDEX(',', multi_string) + 1) - CHARINDEX(',', multi_string) - 1)
WHEN CHARINDEX(',', multi_string) > 0
THEN SUBSTRING(multi_string, CHARINDEX(',', multi_string) + 1, CHARINDEX(']', multi_string) - CHARINDEX(',', multi_string) - 1)
ELSE SUBSTRING(multi_string, CHARINDEX('[', multi_string) + 1, CHARINDEX(']', multi_string) - CHARINDEX('[', multi_string) - 1)
END
ELSE
LTRIM(RTRIM(multi_string)) -- If no brackets, return the original string
END
)) AS Result
FROM dbo.[multi_string_db]
The Issue (Limitations):
The logic works fine but only when [ ] are present, which definitely is the case with current data, however if there were new data then this logic will fail.
For Example:
If we have following string ‘Some time required, Received documents – Unchecked, Misc’ then the logic fails due to absence of [ ] around the string.
I hope this makes sense and wonder if someone can help me overcome this limitation.
2
Answers
Please try the following solution based on tokenization via XML and XQuery. It will work starting from SQL Server 2017 onwards.
No need to do any string parsing via endless
CHARINDEX()
,SUBSTRING()
, and other functions.Notable points:
CROSS APPLY
is tokenizing a string via XML.CROSS APPLY
is counting # of tokens in a string of tokens separated by comma.SELECT
clause has a simpleCASE
statement to retrieve a correct token in a string of tokens based on token count.SQL
Output
I’ve added in your ‘problem’ data in this example
If ignores the
[
and]
to start with an d just finds the parts required for the result, then trim the result of that and replace any remaining[
and]
.If you are using SQL Server 2022 you could use
string_split()
with the ordinal option (I think) and just extract the 1st or 2nd ordinal. It would be really simple then.