I am trying to make a docs question answering program with AzureOpenAI and Langchain

LegoGames
October 31, 2023
224 views
0 votes
2 Answers

llm = AzureOpenAI(openai_api_key=OPENAI_API_KEY, deployment_name=OPENAI_DEPLOYMENT_NAME, model_name=MODEL_NAME)



# Configure the location of the PDF file.
pdfReader = PdfReader('databorders.pdf')


# Extract the text from the PDF file.
raw_text = ''
for i, page in enumerate(pdfReader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

# Show first 1000 characters of the text.
raw_text[:1000]


# Split the text into chunks of 1000 characters with 200 characters overlap.
text_splitter = CharacterTextSplitter(        
    separator = "n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
pdfTexts = text_splitter.split_text(raw_text)


# Show how many chunks of text are generated.
len(pdfTexts)

# Pass the text chunks to the Embedding Model from Azure OpenAI API to generate embeddings.
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, deployment=OPENAI_EMBEDDING_MODEL_NAME, client="azure", chunk_size=1)

# Use FAISS to index the embeddings. This will allow us to perform a similarity search on the texts using the embeddings.
# https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html
pdfDocSearch = FAISS.from_texts(pdfTexts, embeddings)

# Create a Question Answering chain using the embeddings and the similarity search.
# https://docs.langchain.com/docs/components/chains/index_related_chains
chain = load_qa_chain(llm, chain_type="stuff")


# Perform first sample of question answering.
inquiry = "Who is the author of this book?"
docs = pdfDocSearch.similarity_search(inquiry)
chain.run(input_documents=docs, question=inquiry)

It gives this error:
openai.error.InvalidRequestError: The completion operation does not work with the specified model, gpt-4. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.

Answers

It gives this error: openai.error.InvalidRequestError: The completion
operation does not work with the specified model, gpt-4. Please choose
a different model and try again. You can learn more about which models
can be used with each operation here.

The above error occurs when you pass the wrong model or incorrect deployment in the configuration.

According to this Document-1 and Document-2 you need
text-davinci-003 model for completion and text-embedding-ada-002 model for embedding.

When I tried with the above model the code executed and gave me output.

Code:

from langchain.llms import AzureOpenAI
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
from langchain.chains.question_answering import load_qa_chain

OPENAI_API_KEY="xxxxx"
OPENAI_DEPLOYMENT_NAME="testxxxa"    #deployment name with text-embedding-ada-002 model
deployment="textxxx"     #deployment name with text-davinci-003 model
openai_api_base1="xxxxxx"

llm = AzureOpenAI(openai_api_key=OPENAI_API_KEY, deployment_name=deployment,openai_api_base=openai_api_base1,openai_api_version="2022-12-01",openai_api_type="azure")

pdfReader = PdfReader('example.pdf')

raw_text = ''
for i, page in enumerate(pdfReader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

raw_text[:1000]

text_splitter = CharacterTextSplitter(        
    separator = "n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
pdfTexts = text_splitter.split_text(raw_text)

len(pdfTexts)

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, deployment=OPENAI_DEPLOYMENT_NAME, openai_api_base=openai_api_base1, openai_api_type="azure", openai_api_version="2022-12-01",chunk_size=1)

pdfDocSearch = FAISS.from_texts(pdfTexts, embeddings)
chain = load_qa_chain(llm, chain_type="stuff")
inquiry = "Which month is specified?"
docs = pdfDocSearch.similarity_search(inquiry)
print(chain.run(input_documents=docs, question=inquiry))

Output:

 September

- NicolasR
- October 31, 2023 at 2:56 pm
- 0 votes
0
In OpenAI, you have to main operations regarding text generation:
- completion
- chatCompletion
Some models can be used for completion (eg: GPT3.5 version 0301, GPT-4, etc.), other can be used for chatCompletion (eg: GPT3.5 version 0613, GPT-4, etc.).

There is something that is not visible in your code which is the fact that langchain will use OpenAI with a completion operation within its step load_qa_chain.

Doc: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability

So in your case, you should pass a deployment which is compliant with a completion query when you set your llm:
```
llm = AzureOpenAI(openai_api_key=OPENAI_API_KEY, deployment_name=OPENAI_DEPLOYMENT_NAME, model_name=MODEL_NAME)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.