In my use case I am using openai models hosted on azure. I am trying to generate a list of senteces or words with a specific length. Lets take this prompt as an example:
Give 10 Examples of pizza ingredients:
1. tomatoes
2. mushrooms
The text-davinci-003 model completes the list as expected and stops but the gpt-3.5-turbo model generates tokens until the token limit is reached, even when I tell the model to stop when the task is done. Using few shot prompting also doesn’t seem to work here.
Hacky workarounds
-
Using a low value for max_tokens. But it is hard to estimate the value because parts of the prompt will be changed dynamically in the application. And it still needs postprocessing to remove wasted tokens.
-
Put a counter before the examples and then using a specific number as stop sequence. When using a general counter like above then I need to ensure that the stop sequence won’t be generated accidentally so that the model stops. When using an unusual counter like "1~~", "2~~"… there is a chance that the model malforms the stop sequence so that it still will be generating until the limit is reached.
Is there a clean and easy solution to let the model stop generating, like text-davinci-003 does?
2
Answers
In this notebook, There are two strategies for managing the token limit below 4096 when working with the ChatGPT model.
Option 1: Keep the conversation within a given token limit
Option 2: Keep the conversation within a given number of turns
I am also running into same issue and one hack is add the end statement like append –end– in the end and add that as stop sequence, as lowering the token limit is limiting and not full proof. Looking for why there is difference in behaviour in both models.