skip to Main Content

Background Info

I have a python application that uses langchain and Ollama. Running this locally works perfectly fine because I have the Ollama client running on my machine.

What I want to do is host this application on a serverless platform (like GCR, for example) and in order to do this I need to containerize the application. This is easy for the python side of the application but I am struggling to get the Ollama part right.

I need the Ollama client to be running in order for langchain to use the engine and I cannot figure out how to get this right in the dockerfile. I have tried multi stage builds and using the official Ollama docker image and downloading from source and all of these end up with the same issue; I get Ollama onto the container but if I then RUN ollama serve I cannot do anything else as the rest of the code waits for the Ollama Server to complete.

I have tried using the nohup command too when running the server but whenever I try to pull a model using ollama pull <model> it always returns asking if Ollama is running. I have added waits to this and it still doesn’t work.

I have tried using a docker-compose.yml file but the example that I found did not do what I needed. I have tried using multi stage builds in the dockerfile to try build the Ollama server first and then use it in the second stage but this resulted in the same issues as building it in one stage.

I have tried using startup scripts and using those as entry points to get the containers going but they end up with the same errors where Ollama doesn’t start up and I can’t pull images.

Questions

  1. My first question is: is it even possible to achieve what I am trying to do? Would it be better to rather host this type of application on a VM where I can install the required software?

  2. My second question is: if it is possible, has anyone done something similar that they can shed some light or advice on?

Any advice or help with this would be really appreciated!

2

Answers


  1. From an architectural perspective, I suggest installing and configuring ollama as a standalone service on a VM or bare-metal server. This setup can be managed through systemctl status ollama on Linux systems.

    The rationale behind this recommendation includes:

    • Simplicity in managing ollama_as_a_service.
    • Limited benefits of running ollama in a Docker container unless sharing the base operating system with multiple tenants without root access. In such cases, storing the Large Language Models (LLMs) on a physical disk remains necessary regardless of the deployment method.
    Login or Signup to reply.
  2. Create a start_service.sh file:

    #!/bin/sh
    
    # Start Ollama in the background
    ollama serve &
    
    # Wait for Ollama to start
    sleep 5
    
    # Pull and run <YOUR_MODEL_NAME>
    ollama run <YOUR_MODEL_NAME>
    

    In your Dockerfile:

    EXPOSE 11434
    RUN chmod +x /app/start_services.sh
    # Run .sh file
    CMD ["/app/start_services.sh"]
    

    Hope this may help you. This is how it worked for me,

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search