skip to Main Content

I am working on a Retrieval-Augmented Generation (RAG) application that uses Azure OpenAI GPT4o for two types of API calls:

  1. Rephrasing the question (non-streaming call)
  2. Generating a response (streaming call, with stream=True)

I configured the azure-openai-emit-token-metric policy in Azure API Management (APIM) to estimate token usage. It works correctly for non-streaming API calls but does not capture token usage metrics for streaming responses.

I have the following in my Inbound Policy

<when condition="@(context.Request.Headers.GetValueOrDefault("Ocp-Apim-Subscription-Key") == "SERVICE_A_KEY")">
        <azure-openai-emit-token-metric namespace="AzureOpenAI">
            <dimension name="service" value="SERVICE_A" />
        </azure-openai-emit-token-metric>
    </when>

Now here’s the response I’m currently getting:
enter image description here

Currently only the query rephrasing (non streaming) part is getting logged. I want to also log the tokens consumed by the streaming response so we will have 3 more rows with response generation tokens.

I’m separately logging token usage by enabling the stream_options: {"include_usage": true} option in the OpenAI API, but I want to consolidate this logging within APIM using the azure-openai-emit-token-metric policy.

The official docs does say Certain Azure OpenAI endpoints support streaming of responses. When stream is set to true in the API request to enable streaming, token metrics are estimated.

Is it possible to make the azure-openai-emit-token-metric policy work for streaming responses for gpt-4o ?

2

Answers


  1. According to this documentation the OpenAI models supported are

    Chat completion: gpt-3.5 and gpt-4
    Completion : gpt-3.5-turbo-instruct
    Embeddings : text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002

    So, use any one these models.

    I have tried with gpt-4o even i did not got, then tried with gpt-4 got the tokens in logs.

    Output:

    Request i made

    enter image description here

    After successful request,

    message output

    enter image description here

    next go to Trace and then Outbound

    enter image description here

    You will get the tokens usage details.

    enter image description here

    and the same is sent to logs.

    enter image description here

    Login or Signup to reply.
  2. There was an update announced at Ignite, which says GPT-4o models will support these policies, and the updates are rolling out to API Management throughout the end of 2024.

    In short – it’s coming.

    GPT-4o Support Announcement

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search