skip to Main Content

I’m using Langchain with OpenAI to create embeddings from some PDF documents to ask questions of these PDF documents. I’m working in NodeJS and attempting to save vectors in Mongo Atlas.

At service start, I am calling the fromDocuments() method on the MongoDBAtlasVectorSearch class. That method takes an embeddings model, which I am passing in an instance of OpenAIEmbeddings which is calling the OpenAI Embeddings API. The problem I am having is this happens every time the service starts, and I’m running costs up very quickly with OpenAI.

I see in the documentation that fromDocuments() adds the documents to the underlying Mongo collection, but there’s nothing anywhere in the documents indicating where that underlying collection is used later on to reload previously saved vectors.

This is the code I’m using to add documents:

this.vectorStore = await MongoDBAtlasVectorSearch.fromDocuments(docs, embeddings, {
      collection: this.collection
    });

The parameter this.collection is the underlying collection I obtain via the MongoDB client for NodeJS (I retrieve the collection by name prior to this code). The embeddings parameter is an instance of OpenAIEmbeddings.

Is what I’m trying to do even possible? Can we save previously created vectors and reload them to avoid recreating embeddings again after the initial load?

2

Answers


  1. Chosen as BEST ANSWER

    I was finally able to get this working using this documentation: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/mongodb_atlas

    It was as simple as this: load the underlying collection from Mongo using the Mongo driver, and rather than calling fromDocuments() on MongoDBAtlasVectorSearch, all I had to do was construct the object and pass that collection in, like this:

    const embeddings = new OpenAIEmbeddings();
    this.vectorStore = new MongoDBAtlasVectorSearch(embeddings, { collection: this.collection });
    

    Even though I pass in an embeddings object, it doesn't use it if the collection has documents.

    If the underlying collection is empty, then the collection needs to be populated first. You do that by calling fromDocuments() which creates the embeddings and adds the vectors to the collection automagically:

    const embeddings = new OpenAIEmbeddings();
    this.vectorStore = await MongoDBAtlasVectorSearch.fromDocuments(docs, embeddings, {
          collection: this.collection
        });
    

    Hope this helps someone because it was a lifesaver for me!


  2. You should take a look at the LangChain class here: https://js.langchain.com/docs/modules/data_connection/retrievers/integrations/remote-retriever

    It could likely solve your issue.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search