How to replace text between 2 characters MULTIPLE TIMES in a huge string (50,000 characters) in T-SQL?
I have a JSON output from my Azure Pipeline which we drop into a single cell in a table ready for the stored procedure to parse and process the data. However, one of the lines in the JSON looks like this
"body":"bxncucbuc "what is happening" cbibciuc",
"subject:"whatever" etc etc. .........
"body":"bxncucbuc "what more things are happening" cbibciuc",
"subject:"whatever" etc etc. .........
"body":"bxncucbuc "what else is happening" cbibciuc",
"subject:"whatever" etc etc.
The openjson
cannot parse the script as the "what is happening" (or similar) looks like another column as it has the " ‘s in. So, I am looking to replace everything BETWEEN "body":" and "subject":, as actually I don’t need this data anyway.
I can get a single instance replaced using
select
stuff(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"bodyX":"replaced with a random piece of text","')
from
stage.JsonResponse
However
while (select CHARINDEX('"body":"<', jsonresponse) from stage.JsonResponse) != 0
begin
update stage.JsonResponse
set JsonResponse = stuff(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"bodyX":"replaced with a random piece of text","')
from stage.Mimecast_JsonResponse
if (select CHARINDEX('"body":"<', jsonresponse)
from stage.JsonResponse) = 0
BREAK
else
CONTINUE
END
just runs forever. I have changed it to look for "body":"< in case it was inadvertently changing some script and then looking for itself, and changing that again etc.
Similarly, I have tried a CTE:
;WITH Cte(jsonresponse) AS
(
SELECT jsonresponse FROM #a
UNION ALL
SELECT STUFF(jsonresponse, CHARINDEX('"body":"', jsonresponse), CHARINDEX('"subject":"', jsonresponse) - CHARINDEX('"body":"', jsonresponse) + 1, '"body":"replaced with a random piece of text","')
FROM Cte
WHERE CHARINDEX('"body":"<', jsonresponse) != 0
)
SELECT jsonresponse
FROM Cte
WHERE CHARINDEX('"body":"<', jsonresponse) = 0
OPTION (MAXRECURSION 0); `
but this too seems to run and run, but actually only changes the first instance.
My JSON is 50,000 characters long, and there are multiple instances in the json that I need to replace ALL OF THEM, and this is where I have struggled, how to change EVERY INSTANCE of the offending script.
Any help will be appreciated.
I am using T-SQL on an Azure SQL database.
2
Answers
Trial an approach to locating the substring containing unwanted double quotes:
If that works reliably, then the following shuld also work to replace the inner set of double quotes with some other character (I used a pipe) below
A tiny test case is here to trial: https://dbfiddle.uk/iorTQkR1
fiddle
Looking at the option of using a recursive CTE, I believe this should work – I did some experimenting and it appears (probably due to how the data gets amended and referenced or passed around internally) the most reliable way is to not affect the length of the string but just to replace characters in-situ.
The following example works to find the sections between each body/subject element and replaces the content with a single repeating character without changing the length.
Since you don’t need this content anyway you can then use replace to remove it completely.
Does this help for your scenario?
Demo DBFiddle