I am adding data to my assistant in Azure AI Studio manually by dragging and dropping a file. The file is 3.3MB and 265 pages long. When executing the index creation. I get this warning:
Your data was connected with the following warnings
Truncated extracted text to ‘65536’ characters. (1 item(s) impacted)
Which makes me think that the whole PDF is not available in the index.
No matter what chunk size I select, I get this error. And there doesn’t seem to be any other way of affecting this output. How can this be fixed?
2
Answers
In Azure AI Search, there is a limit on the amount of text that an indexer can extract from each of your documents that varies by the search service SKU. Based on your error, it looks like you are using a Basic Azure AI Search instance. Using an S1 would allow you to be able to extract 4 million characters per document instead of only 64 thousand, which is usually sufficient for most customer documents.
Reference: Indexer limits See the "Blob indexer: maximum characters of content extracted from a blob" limit.
I have the same issue, I have a pdf file with 190 pages. What if i were to divide the 190 page pdf into half and upload it that way into my container? I am trying to avoid upgrading, if possible