ashap551
August 15, 2024
134 views
0 votes
2 Answers

I’m using Generative AI API to return text responses as JSON strings which I intend to feed data into an application in real time. The problem is that often the JSON response provided by GenAI API includes small errors- most commonly with double quotes. These syntax issues in the response JSON string trigger errors in my python code when converting them to JSON.

For instance, I have the following JSON string:
'{"test":"this is "test" of "a" test"","result":"your result is "out" in our website"}'

As you can see, the value for "test" has multiple double quotations. So if I try to convert this to json, I get an error. What I want to do is utilize regex to convert the double quotations to single quotations. So the result can look as follows:
'{"test":"this is 'test' of 'a' test'", "result": "your result is 'out' in our website"}'

The best I can do is as follows:

def repl_call(m):
    preq = m.group(1)
    qbody = m.group(2)
    qbody = re.sub( r'"', "'", qbody )
    return preq + '"' + qbody + '"'

print( re.sub( r'([:[,{]s*)"(.*?)"(?=s*[:,]}])', repl_call, text ))

The following code successfully returns the intended result. However, if I were to add a comma, such as
{"test":"this is "test" of "a", test"","result":"your result is "out" in our website"}

…the code breaks and returns the following:
'{"test":"this is 'test' of 'a", test"","result":"your result is 'out' in our website"}'

🙁

I’ve presently have tried to improve my AI prompt (prompt engineering) to avoid the double quotations and return only a valid JSON string. This works to some degree, but I still encounter enough errors in syntax that require me to retry the same prompt multiple times- which incurs unnecessary delays and costs.

My question is:
Is there such thing as a common function and REGEX pattern I can apply in python to fix my JSON string so that it properly cleanses syntax errors? Specifically relating to misplaced double quotes.

I’m open to a variety of suggestions, including possible Python packages that can deal with JSON string cleansing. Even any advice on advanced GenAI tools that do JSON enforcement. I presently use Gemeni- which I like a lot. But doesn’t allow JSON enforcement like OpenAI’s API allows more explicitly.

Thank you in advance.

Answers

- MarkMcDonald
- August 15, 2024 at 7:22 am
- 0 votes
0
But doesn’t allow JSON enforcement

You mean like JSON mode? Docs Cookbook

Login or Signup to reply.

If you are requesting JSon back you should be using the response_mime_type and then you will not have these issues with parsing the JSon.

from dotenv import load_dotenv
import google.generativeai as genai
import os

load_dotenv()
genai.configure(api_key=os.environ['API_KEY'])
MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']

model = genai.GenerativeModel(
    model_name=MODEL_NAME_LATEST,
    # Set the `response_mime_type` to output JSON
    generation_config={"response_mime_type": "application/json"})

prompt = """
  List 5 popular cookie recipes.
  Using this JSON schema:
    Recipe = {"recipe_name": str}
  Return a `list[Recipe]`
  """

response = model.generate_content(prompt)
print(response.text)

Just remember to ensure that the JSon object you tell it to use is actually correct JSon or it may build it incorrectly include all , where they should be

response schema

Another option would be to use response schema.

from dotenv import load_dotenv
import google.generativeai as genai
import os
import typing_extensions as typing

load_dotenv()
genai.configure(api_key=os.environ['API_KEY'])
MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']


class Recipe(typing.TypedDict):
    recipe_name: str


model = genai.GenerativeModel(
    model_name=MODEL_NAME_LATEST,
    # Set the `response_mime_type` to output JSON
    # Pass the schema object to the `response_schema` field
    generation_config={"response_mime_type": "application/json",
                       "response_schema": list[Recipe]})

prompt = "List 5 popular cookie recipes"

response = model.generate_content(prompt)
print(response.text)

see Json mode

Please signup or login to give your own answer.

Click here to cancel reply.

Using REGEX to Handle Nested Double Quotes in JSON Strings in Python

Answers

response schema