skip to Main Content

I’m working on a Python script that extracts JSON content from a message string. The function is designed to parse a JSON block enclosed within json tags. Below is the JSON input and the script I’m using:

JSON Input:

json
{
  "description": "Crie um novo projeto chamado 'ProjetoTeste1'.",
  "code": "local new_project = NewProject('ProjetoTeste1')
if new_project then
    print('Projeto criado com sucesso: ' .. new_project.name)
else
    print('Falha ao criar o projeto')
end"
}

Python Script:

import re
import json
from langchain_core.messages import AIMessage


class CodeSolution:
    def __init__(self, prefix: str, code: str):
        self.prefix = prefix
        self.code = code

def escape_code_field(code_str):
    # Escape backslashes first
    code_str = code_str.replace('\', '\\')
    # Escape double quotes
    code_str = code_str.replace('"', '\"')
    # Escape newlines
    code_str = code_str.replace('n', '\n')
    return code_str

def unescape_code_field(code_str):
    # Unescape newlines
    code_str = code_str.replace('\n', 'n')
    # Unescape double quotes
    code_str = code_str.replace('\"', '"')
    # Unescape backslashes
    code_str = code_str.replace('\\', '\')
    return code_str


def extract_json(message) -> Any:
    text = message.content
    print("Message content:")
    print(text)
    pattern =  r"```jsons*({.*?})s*```"
    matches = re.findall(pattern, text, re.DOTALL)

    if not matches:
        raise ValueError("No JSON content found in the message.")
    
    json_content = matches[0].strip()
    
    # Escape the code field
    code_pattern = r'("code"s*:s*")([sS]*?)("s*[,}])'
    def replace_code(match):
        code_value = match.group(2)
        escaped_code = escape_code_field(code_value)
        return f'{match.group(1)}{escaped_code}{match.group(3)}'
    
    json_content_escaped = re.sub(code_pattern, replace_code, json_content)
    
    try:
        parsed = json.loads(json_content_escaped)
    except json.JSONDecodeError as e:
        print(f"Error parsing content with JSON: {e}")
        print("Escaped JSON content:")
        print(json_content_escaped)
        raise ValueError(f"Failed to parse JSON content: {e}") from e

    prefix = parsed.get('description', 'No description available')
    code = parsed.get('code', '')

    if not code:
        raise ValueError("No 'code' field found in the parsed JSON.")

    code = unescape_code_field(code)

    return CodeSolution(prefix=prefix, code=code)

Error Traceback:

ValueError: No JSON content found in the message.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: No JSON content found in the message.

Additional Information:

The JSON that works is more complex and includes nested structures, whereas the failing JSON is simpler.
Both JSON blocks are correctly formatted and enclosed within json tags in the message content.
I adapted the script to handle the second JSON, but it still fails to recognize it.
Working JSON Example:

json
{
  "description": "Obtaining the production of the well P10 in the model 'Modelo10' of the project 'ProjetoTeste10'",
  "code": "local projects = GetProjects()
local project = projects['ProjetoTeste10']
if project then
    local model = project.flux['Modelo10']
    if model then
        local well = model.well['P10']
        if well then
            local np = well.data['NP']
            if np then
                print('Produção acumulada do poço P10: ' .. np[#np])
            else
                print('Dados de produção não encontrados para o poço P10.')
            end
        else
            print('Poço P10 não encontrado no modelo "Modelo10".')
        end
    else
        print('Modelo "Modelo10" não encontrado no projeto "ProjetoTeste10".')
    end
else
        print('Projeto "ProjetoTeste10" não encontrado.')
    end"
}

What I’ve Tried:

  • Verified that the JSON is correctly formatted. Checked the regex
    pattern to ensure it accurately captures the JSON block. Added print
    statements to debug and confirm the message content.

Request for Help:

Why is the script unable to find JSON content in the simpler JSON example while successfully parsing the more complex one? Is there an issue with the regex pattern or the way the JSON is being processed? Any guidance on how to fix this would be greatly appreciated. Thank you!

2

Answers


  1. I’m happy to help but please provide more info, as I think you have something set up wrong(or are using the wrong tools). What exactly is this code doing?
    What version of Python are you running? I was able to get it working with Python 3.12 on Linux Mint 22. I may be misunderstanding, but when I run your code with this main function:

     import re
    import json
    from langchain_core.messages import AIMessage
    
    
    class CodeSolution:
        ....
    
    def escape_code_field(code_str):
        ...
    
    
    def unescape_code_field(code_str):
        ....
    
    
    def extract_json(message):
        ....
    
        return CodeSolution(prefix=prefix, code=code)
    
    def main():
        text = AIMessage(
        '''```json
        {
          "description": "Crie um novo projeto chamado 'ProjetoTeste1'.",
          "code": "local new_project = NewProject('ProjetoTeste1')
        if new_project then
            print('Projeto criado com sucesso: ' .. new_project.name)
        else
            print('Falha ao criar o projeto')
        end"
        }
        ```'''
        )
        result = extract_json(text)
        print("Result prefix:n", result.prefix)
        print("Result code:n", result.code)
    
    
    if __name__ == '__main__':
        main()
    
    

    I got the output:

    Message content:
    ```json
        {
          "description": "Crie um novo projeto chamado 'ProjetoTeste1'.",
          "code": "local new_project = NewProject('ProjetoTeste1')
        if new_project then
            print('Projeto criado com sucesso: ' .. new_project.name)
        else
            print('Falha ao criar o projeto')
        end"
        }
        ```
    Result prefix:
     Crie um novo projeto chamado 'ProjetoTeste1'.
    Result code:
     local new_project = NewProject('ProjetoTeste1')
        if new_project then
            print('Projeto criado com sucesso: ' .. new_project.name)
        else
            print('Falha ao criar o projeto')
        end
    
    Login or Signup to reply.
  2. Below solution works with your examples.

    What I changed:

    • Made reproducible example with a minimal Message class definition.
    • Added two message examples.
    • Removed extraneous -> Any from extract_json function.
    • Removed triple-backticks from pattern in extract_json.
    • Added | re.MULTILINE to matches.
    • Tested both message solutions.

    Code:

    import re
    import json
    
    class Message:
        pass
    
    message1 = Message()
    message1.content = '''
    json
    {
      "description": "Crie um novo projeto chamado 'ProjetoTeste1'.",
      "code": "local new_project = NewProject('ProjetoTeste1')
    if new_project then
        print('Projeto criado com sucesso: ' .. new_project.name)
    else
        print('Falha ao criar o projeto')
    end"
    }
    '''
    
    message2 = Message()
    message2.content = '''
    json
    {
      "description": "Obtaining the production of the well P10 in the model 'Modelo10' of the project 'ProjetoTeste10'",
      "code": "local projects = GetProjects()
    local project = projects['ProjetoTeste10']
    if project then
        local model = project.flux['Modelo10']
        if model then
            local well = model.well['P10']
            if well then
                local np = well.data['NP']
                if np then
                    print('Produção acumulada do poço P10: ' .. np[#np])
                else
                    print('Dados de produção não encontrados para o poço P10.')
                end
            else
                print('Poço P10 não encontrado no modelo "Modelo10".')
            end
        else
            print('Modelo "Modelo10" não encontrado no projeto "ProjetoTeste10".')
        end
    else
            print('Projeto "ProjetoTeste10" não encontrado.')
        end"
    }
    '''
    
    class CodeSolution:
        def __init__(self, prefix: str, code: str):
            self.prefix = prefix
            self.code = code
    
    def escape_code_field(code_str):
        # Escape backslashes first
        code_str = code_str.replace('\', '\\')
        # Escape double quotes
        code_str = code_str.replace('"', '\"')
        # Escape newlines
        code_str = code_str.replace('n', '\n')
        return code_str
    
    def unescape_code_field(code_str):
        # Unescape newlines
        code_str = code_str.replace('\n', 'n')
        # Unescape double quotes
        code_str = code_str.replace('\"', '"')
        # Unescape backslashes
        code_str = code_str.replace('\\', '\')
        return code_str
    
    
    def extract_json(message):
        text = message.content
        print("Message content:")
        print(text)
        pattern =  r"jsons*({.*?})s*"
        matches = re.findall(pattern, text, re.DOTALL | re.MULTILINE)
    
        if not matches:
            raise ValueError("No JSON content found in the message.")
        
        json_content = matches[0].strip()
        
        # Escape the code field
        code_pattern = r'("code"s*:s*")([sS]*?)("s*[,}])'
        def replace_code(match):
            code_value = match.group(2)
            escaped_code = escape_code_field(code_value)
            return f'{match.group(1)}{escaped_code}{match.group(3)}'
        
        json_content_escaped = re.sub(code_pattern, replace_code, json_content)
        
        try:
            parsed = json.loads(json_content_escaped)
        except json.JSONDecodeError as e:
            print(f"Error parsing content with JSON: {e}")
            print("Escaped JSON content:")
            print(json_content_escaped)
            raise ValueError(f"Failed to parse JSON content: {e}") from e
    
        prefix = parsed.get('description', 'No description available')
        code = parsed.get('code', '')
    
        if not code:
            raise ValueError("No 'code' field found in the parsed JSON.")
    
        code = unescape_code_field(code)
    
        return CodeSolution(prefix=prefix, code=code)
    
    solution1 = extract_json(message1)
    print(solution1.prefix)
    print()
    print(solution1.code)
    
    print()
    
    solution2 = extract_json(message2)
    print(solution2.prefix)
    print()
    print(solution2.code)
    

    Output:

    Message content:
    json
    {
      "description": "Crie um novo projeto chamado 'ProjetoTeste1'.",
      "code": "local new_project = NewProject('ProjetoTeste1')
    if new_project then
        print('Projeto criado com sucesso: ' .. new_project.name)
    else
        print('Falha ao criar o projeto')
    end"
    }
    
    Crie um novo projeto chamado 'ProjetoTeste1'.
    
    local new_project = NewProject('ProjetoTeste1')
    if new_project then
        print('Projeto criado com sucesso: ' .. new_project.name)
    else
        print('Falha ao criar o projeto')
    end
    
    Message content:
    json
    {
      "description": "Obtaining the production of the well P10 in the model 'Modelo10' of the project 'ProjetoTeste10'",
      "code": "local projects = GetProjects()
    local project = projects['ProjetoTeste10']
    if project then
        local model = project.flux['Modelo10']
        if model then
            local well = model.well['P10']
            if well then
                local np = well.data['NP']
                if np then
                    print('Produção acumulada do poço P10: ' .. np[#np])
                else
                    print('Dados de produção não encontrados para o poço P10.')
                end
            else
                print('Poço P10 não encontrado no modelo "Modelo10".')
            end
        else
            print('Modelo "Modelo10" não encontrado no projeto "ProjetoTeste10".')
        end
    else
            print('Projeto "ProjetoTeste10" não encontrado.')
        end"
    }
    
    Obtaining the production of the well P10 in the model 'Modelo10' of the project 'ProjetoTeste10'
    
    local projects = GetProjects()
    local project = projects['ProjetoTeste10']
    if project then
        local model = project.flux['Modelo10']
        if model then
            local well = model.well['P10']
            if well then
                local np = well.data['NP']
                if np then
                    print('Produção acumulada do poço P10: ' .. np[#np])
                else
                    print('Dados de produção não encontrados para o poço P10.')
                end
            else
                print('Poço P10 não encontrado no modelo "Modelo10".')
            end
        else
            print('Modelo "Modelo10" não encontrado no projeto "ProjetoTeste10".')
        end
    else
            print('Projeto "ProjetoTeste10" não encontrado.')
        end
    

    Note that it would be more simple to write proper JSON in the first place by creating a Python dictionary of the description and code. Use a triple-quoted string to create the code string and the JSON will be written with the newlines properly escaped. Then you could directly loads the JSON without extra handling.

    Example:

    import json
    
    data = {
        "description": "Crie um novo projeto chamado 'ProjetoTeste1'.",
        "code": '''
    local new_project = NewProject('ProjetoTeste1')
    if new_project then
        print('Projeto criado com sucesso: ' .. new_project.name)
    else
        print('Falha ao criar o projeto')
    end'''
    }
    
    print(json.dumps(data1, indent=2))
    

    Output:

    {
      "description": "Crie um novo projeto chamado 'ProjetoTeste1'.",
      "code": "local new_project = NewProject('ProjetoTeste1')nif new_project thenn    print('Projeto criado com sucesso: ' .. new_project.name)nelsen    print('Falha ao criar o projeto')nend"
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search