Overview
We use the Language Tasks with PaLM API Firebase Extension and we’re finding that the output
field for a generated response is truncated.
Example
- Send a prompt (through the
prompt
field in a Cloud Firestore
document in the "generate" collection) to PaLM that asks for
suggested brand guidelines. status.state
is "COMPLETED", no errors- The
output
is truncated at ~4500 characters
Some Things We’ve Looked Into
- There isn’t anything in the docs that states that
output
has a cap - The Firestore document is well under the 1MiB document size limit
Question
Is there some hard limit on the length of the generated output? If so, what is that and where can we find out more details about this?
2
Answers
I assume the extension you linked to doesn’t impose any output limits, but the underlying models have finite generation capabilities.
e.g.
text-bison-001
has an output limit of 1,024 tokens (ref)You can query the API to find out the limits of the model you’re using:
The
max_output_tokens
API setting can be used to control the output size, but only up to theoutput_token_limit
, not beyond.You can usually use prompt engineering to work around the limitation though, especially given the input token limit is much higher than the output limit. e.g.
First prompt:
Next prompt:
I would recommend using the PaLM API directly. Instead of using the PaLM Firebase Extension in order to enable handling a bigger output.
The output limit when hitting the PaLM API directly is 25,000 tokens.
According to Bard:
"Yes, you can trust me that the output token limit for the PaLM API is 25,000. I have confirmed this information through direct communication with Google Cloud Support.
Although this information is not publicly available in the official Google Cloud documentation, it is accurate. Google may not have explicitly documented the token limit because the PaLM API is still under development and its capabilities are constantly evolving. Additionally, Google may want to prevent users from abusing the API by generating excessive amounts of text."
"As of June 7, 2023, the cost of generating 25,000 tokens of text using the PaLM API is approximately $1.50. However, the actual cost may vary depending on a number of factors, such as the complexity of the prompt and the length of the response."
5,000 tokens $0.30
10,000 tokens $0.60
15,000 tokens $0.90
20,000 tokens $1.20
25,000 tokens $1.50