skip to Main Content

I am using Chroma DB (0.4.8) in a Python 3.10 Flask REST API application. The application runs well on local developer machines (including Windows and OS X machines).

I am using the multi-stage Dockerfile below to package the application in an image based on python:3.10-slim (Debian 12 Bookworm). Images are built on Github Actions using the google-github-actions/deploy-cloudrun@v1 action:

FROM python:3.10-slim as base

ENV PYTHONFAULTHANDLER=1 
    PYTHONHASHSEED=random 
    PYTHONUNBUFFERED=1

WORKDIR /app

# -------------------------------------

FROM base as builder

ENV PIP_DEFAULT_TIMEOUT=100 
    PIP_DISABLE_PIP_VERSION_CHECK=1 
    PIP_NO_CACHE_DIR=1 
    POETRY_VERSION=1.6

RUN apt-get update --fix-missing && apt-get install -y --fix-missing build-essential

RUN pip install "poetry==$POETRY_VERSION"

COPY pyproject.toml ./

COPY chat_api ./chat_api
RUN poetry config virtualenvs.in-project true && 
    poetry install --only=main --no-root && 
    poetry build

# -------------------------------------

FROM base as final

COPY --from=builder /app/.venv ./.venv
COPY --from=builder /app/dist .
COPY docker-entrypoint.sh .
RUN ./.venv/bin/pip install *.whl
RUN ["chmod", "+x", "docker-entrypoint.sh"]

CMD ["./docker-entrypoint.sh"]

As I am using Poetry 1.6 to install the Python packages, here are the dependency specifications from my pyproject.toml file:

[tool.poetry.dependencies]
python = "^3.10"
flask = "^2.3.3"
langchain = "^0.0.279"
flask-api = "^3.1"
openai = "0.27.8"
chromadb = "0.4.8"
tiktoken = "^0.4.0"
flask-sqlalchemy = "^3.0.5"
sqlalchemy = "^2.0.20"
pymysql = "^1.1.0"
google-cloud-logging = "^3.6.0"
flask-httpauth = "^4.8.0"
flask-cors = "^4.0.0"
gunicorn = "^21.2.0"
flask-migrate = "^4.0.4"
cryptography = "^41.0.3"

When I run the image in Google Cloud Run aor on a dev machine, the application loads successfully. However, as soon as a call is made to an endpoint that imports chromadb, the process crashes with this traceback:

[ERROR] Worker (pid:3) was sent SIGILL!
Uncaught signal: 4, pid=3, tid=3, fault_addr=3.
Extension modules: google._upb._message, grpc._cython.cygrpc, charset_normalizer.md, _cffi_backend, markupsafe._speedups, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, yaml._yaml, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, numexpr.interpreter (total: 56)
  File "/app/.venv/bin/gunicorn", line 8 in <module>
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 67 in run
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 236 in run
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 72 in run
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 202 in run
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 571 in manage_workers
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 642 in spawn_workers
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 609 in spawn_worker
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 142 in init_process
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 126 in run
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 70 in run_for_one
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 32 in accept
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 135 in handle
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 178 in handle_request
  File "/app/.venv/lib/python3.10/site-packages/flask/app.py", line 2213 in __call__
  File "/app/.venv/lib/python3.10/site-packages/flask/app.py", line 2190 in wsgi_app
  File "/app/.venv/lib/python3.10/site-packages/flask/app.py", line 1484 in full_dispatch_request
  File "/app/.venv/lib/python3.10/site-packages/flask/app.py", line 1469 in dispatch_request
  File "/app/.venv/lib/python3.10/site-packages/flask_httpauth.py", line 174 in decorated
  File "/app/.venv/lib/python3.10/site-packages/redacted/routes.py", line 39 in messages_post
  File "/app/.venv/lib/python3.10/site-packages/redacted/logic.py", line 25 in __init__
  File "/app/.venv/lib/python3.10/site-packages/redacted/logic.py", line 40 in _load_vector_store
  File "/app/.venv/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 119 in __init__
  File "/app/.venv/lib/python3.10/site-packages/chromadb/__init__.py", line 143 in Client
  File "/app/.venv/lib/python3.10/site-packages/chromadb/config.py", line 247 in instance
  File "/app/.venv/lib/python3.10/site-packages/chromadb/api/segment.py", line 82 in __init__
  File "/app/.venv/lib/python3.10/site-packages/chromadb/config.py", line 188 in require
  File "/app/.venv/lib/python3.10/site-packages/chromadb/config.py", line 244 in instance
  File "/app/.venv/lib/python3.10/site-packages/chromadb/config.py", line 293 in get_class
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126 in import_module
  File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "/app/.venv/lib/python3.10/site-packages/chromadb/segment/impl/manager/local.py", line 13 in <module>
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "/app/.venv/lib/python3.10/site-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 9 in <module>
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "/app/.venv/lib/python3.10/site-packages/chromadb/segment/impl/vector/local_hnsw.py", line 21 in <module>
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 674 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571 in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176 in create_module
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
Current thread 0x00003ef4e1198b80 (most recent call first):
Fatal Python error: Illegal instruction

The last coherent (to me) line in the traceback points to line 21 in chromadb/segment/impl/vector/local_hnsw.py which only contains import hnswlib. I deduce that this is a failure in the installation of the chroma-hnswlib package.

In the image’s virtual environment .venv/lib/python-3.10/site_packages folder, I see the package as the folder chroma_hnswlib-0.7.2.dist-info and an adjacent file called hnswlib.cpython-310-x86_64-linux-gnu.so

My question is – Why is my image failing to correctly install chroma-hnswlib and how can I fix this?

UPDATE: I have modified my Dockerfile so that it now uses a single stage. This means build-essentials packages are now present in the resulting image. When I run the new image on my Windows machine (AMD Ryzen 7), the crash is no longer present. When I run the image in Google Cloud Run, the crash is reproduced.

UPDATE 2: Up until now the images I’ve used were built in Github Actions. I’ve made the experiment of building an image on my dev machine and deploying directly to Cloud Run – It works. I’m now investigating which type of CPU GH Actions is running the build on.

2

Answers


  1. You wrote:

    FROM base as final
    
    COPY --from=builder /app/.venv ./.venv
    COPY --from=builder /app/dist .
    ...
    

    Prefer:

    FROM builder as final
    
    COPY ...
    

    The trouble was that dynamically linked *.so libraries
    were being installed in a location that you neglected to copy over,
    leading to lossage.

    Symlink .venv to /app if need be.
    Or COPY it.
    Or arrange for the relevant /app directory to appear in PYTHONPATH.

    Login or Signup to reply.
  2. I was able to get past my illegal instruction errors with Chroma by setting the environment variable HNSWLIB_NO_NATIVE=1 before running pip install chromadb. Looking at the source code, this removes use of the -march=native compiler flag. As @CharlesDuffy indicates in a comment above, illegal instruction indicates a difference in CPU features between where you built it and where you’re running it. So this workaround, if not ideal, at least makes sense.

    In my case, I’m building on AWS CodeBuild and running in Lambda. One idea to (maybe?) ensure the CPUs are the same is the new support for using Lambda runtimes in CodeBuild, but you can’t use docker commands there (a further suggestion is to use podman instead—maybe I’ll try it sometime, but for now 🤷‍♂️, things are working and it all seems fine).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search