I am trying to create a Sagemaker endpoint for model inference using the Build your own algorithm container (https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.html) but am having an issue when installing Numpy in the creation of the image.
We’ve already previously have gotten it to work with our old model, but the new vowpal wabbit model requires numpy, scikit-learn, pandas and vowpal wabbit library which is causing it to fail in the docker build. I’m not sure if we should continue using this container or should migrate to a python one or sagemaker one, but would need to support nginx.
#EDIT: Forgot to mention that when I build it locally, it is created successfully but when fails through Cloudformation.
Dockerfile here:
# This is a Python 3 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.
FROM ubuntu:18.04
# Retrieves information about what packages can be installed
RUN apt-get -y update &&
apt-get install -y --no-install-recommends
wget
python3-pip
python3.8
python3-setuptools
nginx
ca-certificates &&
rm -rf /var/lib/apt/lists/*
# Set python 3.8 as default
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.8 1
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
# Get all python packages without excess cache created by pip.
COPY requirements.txt .
RUN pip3 install --upgrade pip setuptools wheel
RUN pip3 --no-cache-dir install -r requirements.txt
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# model_output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
ENV PYTHONPATH /model_contents
# Set up the program in the image
COPY bandit/ /opt/program/
WORKDIR /opt/program/
# create directories for storing model and vectorizer
RUN mkdir model && mkdir vectorizer
# Give permissions to run scripts
RUN chmod +x /opt/program/serve && chmod +x /opt/program/train
requirements.txt here:
sagemaker==2.25.1
typing-extensions==3.7.4.3
numpy==1.20.1
boto3==1.17.12
awscli==1.19.12
python-dotenv==0.15.0
flask==1.1.2
scikit-learn==1.0.0
pandas==1.3.5
vowpalwabbit==8.11.0
Full traceback here:
Running setup.py install for numpy: started
Running setup.py install for numpy: finished with status 'error'
Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-cd653krx/numpy/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('rn', 'n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-q3eo46tw-record/install-record.txt --single-version-externally-managed --compile:
Running from numpy source directory.
Note: if you need reliable uninstall behavior, then install
with pip instead of using `setup.py install`:
- `pip install .` (from a git repo or downloaded source
release)
- `pip install numpy` (last NumPy release on PyPi)
Cythonizing sources
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_bounded_integers.pyx.in
Traceback (most recent call last):
File "/tmp/pip-build-cd653krx/numpy/tools/cythonize.py", line 53, in process_pyx
import Cython
ModuleNotFoundError: No module named 'Cython'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/pip-build-cd653krx/numpy/tools/cythonize.py", line 234, in <module>
main()
File "/tmp/pip-build-cd653krx/numpy/tools/cythonize.py", line 230, in main
find_process_files(root_dir)
File "/tmp/pip-build-cd653krx/numpy/tools/cythonize.py", line 221, in find_process_files
process(root_dir, fromfile, tofile, function, hash_db)
File "/tmp/pip-build-cd653krx/numpy/tools/cythonize.py", line 187, in process
processor_function(fromfile, tofile)
File "/tmp/pip-build-cd653krx/numpy/tools/cythonize.py", line 90, in process_tempita_pyx
process_pyx(pyxfile, tofile)
File "/tmp/pip-build-cd653krx/numpy/tools/cythonize.py", line 60, in process_pyx
raise OSError(msg) from e
OSError: Cython needs to be installed in Python as a module
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-cd653krx/numpy/setup.py", line 450, in <module>
setup_package()
File "/tmp/pip-build-cd653krx/numpy/setup.py", line 432, in setup_package
generate_cython()
File "/tmp/pip-build-cd653krx/numpy/setup.py", line 237, in generate_cython
raise RuntimeError("Running cythonize failed!")
RuntimeError: Running cythonize failed!
2
Answers
Solved the issue. The numpy version was causing conflicts with the rest of the packages so downgraded which solved the issue.
There are 2 ways to get around the issue –
Add the
numpy
version you need as part of your requirements.txt ( preferred way so that you can manage your dependencies and version better)Install in dependency in the Dockerfile directly.
I work at AWS and my opinions are my own – Thanks,Raghu