skip to Main Content

I am testing with the GPT4 OCR Vision (model version: gpt-4 vision preview),

  • use postman send API with OCR enhancement body like the API document claimed "enhancements":{ "ocr":{"enabled":true}, "grounding":{"enabled":true} }, "dataSources":[ { "type":"AzureComputerVision", "parameters":{ "endpoint":"https://xyz.cognitiveservices.azure.com", "key":"xyz" } } ]However, I got 400 model error "message": "1 validation error for Requestnbody -> enhancementsn extra fields not permitted (type=value_error.extra)", "type": "invalid_request_error",Azure OpenAI Service REST API reference – Azure OpenAI | Microsoft Learn

https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#example-request-2

Can you please help with the OCR enhancement with API?

2

Answers


  1. You can use the the following URI path {base_url}/extensions/chat/completions for OCR enhancements.

    I’m using the sample below (python) and it’s working properly!!!
    https://learn.microsoft.com/en-us/azure/ai-services/openai/gpt-v-quickstart?tabs=image&pivots=rest-api

        # Packages required:
    import requests 
    import json 
    
    api_base = '<your_azure_openai_endpoint>' 
    deployment_name = '<your_deployment_name>'
    API_KEY = '<your_azure_openai_key>'
    
    base_url = f"{api_base}openai/deployments/{deployment_name}" 
    headers = {   
        "Content-Type": "application/json",   
        "api-key": API_KEY 
    } 
    
    # Prepare endpoint, headers, and request body 
    endpoint = f"{base_url}/extensions/chat/completions?api-version=2023-12-01-preview" 
    data = {
        "model": "gpt-4-vision-preview",
        "enhancements": {
            "ocr": {
              "enabled": True
            },
            "grounding": {
              "enabled": True
            }
        },
        "dataSources": [
        {
            "type": "AzureComputerVision",
            "parameters": {
                "endpoint": "<your_computer_vision_endpoint>",
                "key": "<your_computer_vision_key>"
            }
        }],
        "messages": [ 
            { "role": "system", "content": "You are a helpful assistant." }, 
            { "role": "user", 
            "content": [  
                { 
                    "type": "text", 
                    "text": "Describe this picture:" 
                },
                { 
                    "type": "image_url", 
                    "image_url": {
                        "url" : "<URL or base 64 encoded image>"
                    }
                }
            ]} 
        ], 
        "max_tokens": 2000 
    }   
    
    # Make the API call   
    response = requests.post(endpoint, headers=headers, data=json.dumps(data))   
    
    print(f"Status Code: {response.status_code}")   
    print(response.text)
    
    Login or Signup to reply.
  2. I can able to reproduce your issue by simply removing the extensions in the URL. Even i did the same mistake in the first try 🙂

    One can easily make this mistake for Enhanced chat with vision which requires a different endpoint which is extensions on top of chat/completions

    So the correct endpoint to call this feature is

    https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/extensions/chat/completions?api-version=2023-12-01-preview

    Make sure you add extensions

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search