I’m using Generative AI API to return text responses as JSON strings which I intend to feed data into an application in real time. The problem is that often the JSON response provided by GenAI API includes small errors- most commonly with double quotes. These syntax issues in the response JSON string trigger errors in my python code when converting them to JSON.
For instance, I have the following JSON string:
'{"test":"this is "test" of "a" test"","result":"your result is "out" in our website"}'
As you can see, the value for "test" has multiple double quotations. So if I try to convert this to json, I get an error. What I want to do is utilize regex to convert the double quotations to single quotations. So the result can look as follows:
'{"test":"this is 'test' of 'a' test'", "result": "your result is 'out' in our website"}'
The best I can do is as follows:
def repl_call(m):
preq = m.group(1)
qbody = m.group(2)
qbody = re.sub( r'"', "'", qbody )
return preq + '"' + qbody + '"'
print( re.sub( r'([:[,{]s*)"(.*?)"(?=s*[:,]}])', repl_call, text ))
The following code successfully returns the intended result. However, if I were to add a comma, such as
{"test":"this is "test" of "a", test"","result":"your result is "out" in our website"}
…the code breaks and returns the following:
'{"test":"this is 'test' of 'a", test"","result":"your result is 'out' in our website"}'
🙁
I’ve presently have tried to improve my AI prompt (prompt engineering) to avoid the double quotations and return only a valid JSON string. This works to some degree, but I still encounter enough errors in syntax that require me to retry the same prompt multiple times- which incurs unnecessary delays and costs.
My question is:
Is there such thing as a common function and REGEX pattern I can apply in python to fix my JSON string so that it properly cleanses syntax errors? Specifically relating to misplaced double quotes.
I’m open to a variety of suggestions, including possible Python packages that can deal with JSON string cleansing. Even any advice on advanced GenAI tools that do JSON enforcement. I presently use Gemeni- which I like a lot. But doesn’t allow JSON enforcement like OpenAI’s API allows more explicitly.
Thank you in advance.
2
Answers
You mean like JSON mode? Docs Cookbook
If you are requesting JSon back you should be using the response_mime_type and then you will not have these issues with parsing the JSon.
Just remember to ensure that the JSon object you tell it to use is actually correct JSon or it may build it incorrectly include all , where they should be
response schema
Another option would be to use response schema.
see Json mode