I’m using Langchain with OpenAI to create embeddings from some PDF documents to ask questions of these PDF documents. I’m working in NodeJS and attempting to save vectors in Mongo Atlas.
At service start, I am calling the fromDocuments()
method on the MongoDBAtlasVectorSearch
class. That method takes an embeddings model, which I am passing in an instance of OpenAIEmbeddings
which is calling the OpenAI Embeddings API. The problem I am having is this happens every time the service starts, and I’m running costs up very quickly with OpenAI.
I see in the documentation that fromDocuments()
adds the documents to the underlying Mongo collection, but there’s nothing anywhere in the documents indicating where that underlying collection is used later on to reload previously saved vectors.
This is the code I’m using to add documents:
this.vectorStore = await MongoDBAtlasVectorSearch.fromDocuments(docs, embeddings, {
collection: this.collection
});
The parameter this.collection
is the underlying collection I obtain via the MongoDB client for NodeJS (I retrieve the collection by name prior to this code). The embeddings
parameter is an instance of OpenAIEmbeddings
.
Is what I’m trying to do even possible? Can we save previously created vectors and reload them to avoid recreating embeddings again after the initial load?
2
Answers
I was finally able to get this working using this documentation: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/mongodb_atlas
It was as simple as this: load the underlying collection from Mongo using the Mongo driver, and rather than calling
fromDocuments()
onMongoDBAtlasVectorSearch
, all I had to do was construct the object and pass that collection in, like this:Even though I pass in an
embeddings
object, it doesn't use it if the collection has documents.If the underlying collection is empty, then the collection needs to be populated first. You do that by calling
fromDocuments()
which creates the embeddings and adds the vectors to the collection automagically:Hope this helps someone because it was a lifesaver for me!
You should take a look at the LangChain class here: https://js.langchain.com/docs/modules/data_connection/retrievers/integrations/remote-retriever
It could likely solve your issue.