skip to Main Content

How to replace text between 2 characters MULTIPLE TIMES in a huge string (50,000 characters) in T-SQL?

I have a JSON output from my Azure Pipeline which we drop into a single cell in a table ready for the stored procedure to parse and process the data. However, one of the lines in the JSON looks like this

"body":"bxncucbuc "what is happening" cbibciuc",
"subject:"whatever" etc etc. .........  
"body":"bxncucbuc "what more things are happening" cbibciuc",
"subject:"whatever" etc etc. .........
"body":"bxncucbuc "what else is happening" cbibciuc",
"subject:"whatever" etc etc. 

The openjson cannot parse the script as the "what is happening" (or similar) looks like another column as it has the " ‘s in. So, I am looking to replace everything BETWEEN "body":" and "subject":, as actually I don’t need this data anyway.

I can get a single instance replaced using

select   
    stuff(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"bodyX":"replaced with a random piece of text","')
from 
    stage.JsonResponse

However

while (select CHARINDEX('"body":"<', jsonresponse) from stage.JsonResponse) != 0
begin 
    update stage.JsonResponse 
    set JsonResponse =  stuff(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"bodyX":"replaced with a random piece of text","')
    from stage.Mimecast_JsonResponse

    if (select CHARINDEX('"body":"<', jsonresponse) 
        from stage.JsonResponse) = 0
        BREAK
    else
        CONTINUE
END  

just runs forever. I have changed it to look for "body":"< in case it was inadvertently changing some script and then looking for itself, and changing that again etc.

Similarly, I have tried a CTE:

;WITH Cte(jsonresponse) AS 
(
    SELECT jsonresponse FROM #a
    UNION ALL
    SELECT STUFF(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":"', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"body":"replaced with a random piece of text","')
    FROM Cte
    WHERE CHARINDEX('"body":"<', jsonresponse) != 0
)
SELECT jsonresponse 
FROM Cte
WHERE CHARINDEX('"body":"<', jsonresponse) = 0
OPTION (MAXRECURSION 0); `

but this too seems to run and run, but actually only changes the first instance.

My JSON is 50,000 characters long, and there are multiple instances in the json that I need to replace ALL OF THEM, and this is where I have struggled, how to change EVERY INSTANCE of the offending script.

Any help will be appreciated.

I am using T-SQL on an Azure SQL database.

2

Answers


  1. Trial an approach to locating the substring containing unwanted double quotes:

    SELECT 
        id
      , substring(json_data,char_start, char_end) bad_bit
      , replace(substring(json_data,char_start, char_end),'"','|') good_bit
    FROM test_table
    cross apply (
      select
           charindex('"body": "',json_data) + len('"body": "') as char_start
         , charindex('", "subject":',json_data) - len('", "subject":') as char_end
      ) ca1
    where isjson(json_data) = 0
    

    If that works reliably, then the following shuld also work to replace the inner set of double quotes with some other character (I used a pipe) below

    WITH CTE
    AS (
        SELECT id
            , json_data
            , char_start
            , char_end
            , substring(json_data, char_start, char_end) bad_bit
            , replace(substring(json_data, char_start, char_end), '"', '|') good_bit
        FROM test_table
        CROSS APPLY (
            SELECT charindex('"body": "', json_data) + len('"body": "') AS char_start
                , charindex('", "subject":', json_data) - len('", "subject":') AS char_end
            ) ca1
        WHERE isjson(json_data) = 0
        )
    UPDATE CTE
    SET json_data = STUFF(json_data, char_start, char_end, good_bit)
    WHERE bad_bit <> good_bit
    

    A tiny test case is here to trial: https://dbfiddle.uk/iorTQkR1

    SELECT id, b.* 
    FROM test_table
      CROSS APPLY (
        SELECT *
        FROM OpenJson(json_data)
        ) B
    
    id key value type
    1 body bxncucbuc |what is happening| cbibciuc 1
    1 subject other_value 1
    2 body bxncucbuc |what more things are happening| cbibciuc 1
    2 subject other_value 1
    3 body bxncucbuc |what else is happening| cbibciuc 1
    3 subject other_value 1

    fiddle

    Login or Signup to reply.
  2. Looking at the option of using a recursive CTE, I believe this should work – I did some experimenting and it appears (probably due to how the data gets amended and referenced or passed around internally) the most reliable way is to not affect the length of the string but just to replace characters in-situ.

    The following example works to find the sections between each body/subject element and replaces the content with a single repeating character without changing the length.

    Since you don’t need this content anyway you can then use replace to remove it completely.

    Does this help for your scenario?

    Demo DBFiddle

    with p as (
      select Id, jsonCol,
        CharIndex('"body"', jsonCol) as Bpos,
        CharIndex('"subject"', jsonCol) as Spos
      from t
      union all
      select Id, j,
        CharIndex('"body"', j, Spos) as Bpos,
        CharIndex('"subject"', j, Spos + 1) as Spos
      from p
      cross apply(values( Stuff(jsoncol, bpos + 8, spos - bpos - 10, Replicate('#', spos - bpos - 10))))n(j)
      where Bpos > 0 
    )
    select Replace(jsoncol, '#','')
    from p
    where bpos = 0;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search