Why doesn't the azure-openai-emit-token-metric policy capture token usage for GPT4o stream responses?

silenthunter25
December 5, 2024
183 views
0 votes
2 Answers

I am working on a Retrieval-Augmented Generation (RAG) application that uses Azure OpenAI GPT4o for two types of API calls:

Rephrasing the question (non-streaming call)
Generating a response (streaming call, with stream=True)

I configured the azure-openai-emit-token-metric policy in Azure API Management (APIM) to estimate token usage. It works correctly for non-streaming API calls but does not capture token usage metrics for streaming responses.

I have the following in my Inbound Policy

<when condition="@(context.Request.Headers.GetValueOrDefault("Ocp-Apim-Subscription-Key") == "SERVICE_A_KEY")">
        <azure-openai-emit-token-metric namespace="AzureOpenAI">
            <dimension name="service" value="SERVICE_A" />
        </azure-openai-emit-token-metric>
    </when>

Now here’s the response I’m currently getting:

Currently only the query rephrasing (non streaming) part is getting logged. I want to also log the tokens consumed by the streaming response so we will have 3 more rows with response generation tokens.

I’m separately logging token usage by enabling the stream_options: {"include_usage": true} option in the OpenAI API, but I want to consolidate this logging within APIM using the azure-openai-emit-token-metric policy.

The official docs does say Certain Azure OpenAI endpoints support streaming of responses. When stream is set to true in the API request to enable streaming, token metrics are estimated.

Is it possible to make the azure-openai-emit-token-metric policy work for streaming responses for gpt-4o ?

Answers

- JayashankarGS
- November 28, 2024 at 1:52 pm
- 0 votes
0
According to this documentation the OpenAI models supported are

Chat completion: gpt-3.5 and gpt-4
Completion : gpt-3.5-turbo-instruct
Embeddings : text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002

So, use any one these models.

I have tried with gpt-4o even i did not got, then tried with gpt-4 got the tokens in logs.

Output:

Request i made

After successful request,

message output

next go to Trace and then Outbound

You will get the tokens usage details.

and the same is sent to logs.

Login or Signup to reply.

- AdamHockemeyer
- December 5, 2024 at 8:28 pm
- 0 votes
0
There was an update announced at Ignite, which says GPT-4o models will support these policies, and the updates are rolling out to API Management throughout the end of 2024.

In short – it’s coming.

GPT-4o Support Announcement

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.