skip to Main Content

I’m using Generative AI API to return text responses as JSON strings which I intend to feed data into an application in real time. The problem is that often the JSON response provided by GenAI API includes small errors- most commonly with double quotes. These syntax issues in the response JSON string trigger errors in my python code when converting them to JSON.

For instance, I have the following JSON string:
'{"test":"this is "test" of "a" test"","result":"your result is "out" in our website"}'

As you can see, the value for "test" has multiple double quotations. So if I try to convert this to json, I get an error. What I want to do is utilize regex to convert the double quotations to single quotations. So the result can look as follows:
'{"test":"this is 'test' of 'a' test'", "result": "your result is 'out' in our website"}'

The best I can do is as follows:

def repl_call(m):
    preq = m.group(1)
    qbody = m.group(2)
    qbody = re.sub( r'"', "'", qbody )
    return preq + '"' + qbody + '"'

print( re.sub( r'([:[,{]s*)"(.*?)"(?=s*[:,]}])', repl_call, text ))

The following code successfully returns the intended result. However, if I were to add a comma, such as
{"test":"this is "test" of "a", test"","result":"your result is "out" in our website"}

…the code breaks and returns the following:
'{"test":"this is 'test' of 'a", test"","result":"your result is 'out' in our website"}'

🙁

I’ve presently have tried to improve my AI prompt (prompt engineering) to avoid the double quotations and return only a valid JSON string. This works to some degree, but I still encounter enough errors in syntax that require me to retry the same prompt multiple times- which incurs unnecessary delays and costs.

My question is:
Is there such thing as a common function and REGEX pattern I can apply in python to fix my JSON string so that it properly cleanses syntax errors? Specifically relating to misplaced double quotes.

I’m open to a variety of suggestions, including possible Python packages that can deal with JSON string cleansing. Even any advice on advanced GenAI tools that do JSON enforcement. I presently use Gemeni- which I like a lot. But doesn’t allow JSON enforcement like OpenAI’s API allows more explicitly.

Thank you in advance.

2

Answers


  1. But doesn’t allow JSON enforcement

    You mean like JSON mode? Docs Cookbook

    Login or Signup to reply.
  2. If you are requesting JSon back you should be using the response_mime_type and then you will not have these issues with parsing the JSon.

    from dotenv import load_dotenv
    import google.generativeai as genai
    import os
    
    load_dotenv()
    genai.configure(api_key=os.environ['API_KEY'])
    MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']
    
    model = genai.GenerativeModel(
        model_name=MODEL_NAME_LATEST,
        # Set the `response_mime_type` to output JSON
        generation_config={"response_mime_type": "application/json"})
    
    prompt = """
      List 5 popular cookie recipes.
      Using this JSON schema:
        Recipe = {"recipe_name": str}
      Return a `list[Recipe]`
      """
    
    response = model.generate_content(prompt)
    print(response.text)
    

    Just remember to ensure that the JSon object you tell it to use is actually correct JSon or it may build it incorrectly include all , where they should be

    response schema

    Another option would be to use response schema.

    from dotenv import load_dotenv
    import google.generativeai as genai
    import os
    import typing_extensions as typing
    
    load_dotenv()
    genai.configure(api_key=os.environ['API_KEY'])
    MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']
    
    
    class Recipe(typing.TypedDict):
        recipe_name: str
    
    
    model = genai.GenerativeModel(
        model_name=MODEL_NAME_LATEST,
        # Set the `response_mime_type` to output JSON
        # Pass the schema object to the `response_schema` field
        generation_config={"response_mime_type": "application/json",
                           "response_schema": list[Recipe]})
    
    prompt = "List 5 popular cookie recipes"
    
    response = model.generate_content(prompt)
    print(response.text)
    

    see Json mode

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search