I am trying to use Pytorch with a GPU on my Docker Container.
1. On the Host –
I have nvidia-docker installed, CUDA Driver etc
Here is the nvidia-smi output from host:
Fri Mar 20 04:29:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 33C P8 28W / 149W | 16MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1860 G /usr/lib/xorg/Xorg 15MiB |
+-----------------------------------------------------------------------------+
2. On the Docker Container (Dockerfile for app – Docker Compose File below) –
FROM ubuntu:latest
FROM dsksd/pytorch:0.4
#FROM nvidia/cuda:10.1-base-ubuntu18.04
#FROM nablascom/cuda-pytorch
#FROM nvidia/cuda:10.0-base
RUN apt-get update -y --fix-missing
RUN apt-get install -y python3-pip python3-dev build-essential
RUN apt-get install -y sudo curl
#RUN sudo apt-get install -y nvidia-container-toolkit
#RUN apt-get install -y curl python3.7 python3-pip python3.7-dev python3.7-distutils build-essential
#RUN apt-get install -y curl
#RUN apt-get install -y sudo
#RUN curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
#RUN sudo dpkg -i cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
#RUN sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
#RUN sudo apt-get install cuda -y
#----------
# Add the package repositories
#RUN distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
#RUN curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
#RUN curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
#RUN sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
#RUN sudo systemctl restart docker
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/cuda-10.1/compat/
ENV PYTHONPATH $PATH
#----------
ENV LC_ALL=mylocale.utf8
COPY . /app
WORKDIR /app
RUN pip3 install -r requirements.txt
ENTRYPOINT ["python3"]
EXPOSE 5000
CMD ["hook.py"]
When I try running my code on the GPU I run into:
>>> torch.cuda.current_device()
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 386, in current_device
_lazy_init()
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 193, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50
I invoke the container using : docker-compose up --build
Here is my docker-compose.yaml file:
version: '3.6'
services:
rdb:
image: mysql:5.7
#restart: always
environment:
MYSQL_DATABASE: 'c_rdb'
MYSQL_USER: 'user'
MYSQL_PASSWORD: 'password'
MYSQL_ROOT_PASSWORD: '123123'
#ports:
# - '3306:3306'
#expose:
# - '3306'
volumes:
- rdb-data:/var/lib/mysql
- ./init-db/init.sql:/docker-entrypoint-initdb.d/init.sql
mongo:
image: mongo
#restart: always
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: 12312323
MONGO_INITDB_DATABASE: chronicler_ndb
volumes:
- ndb-data:/data/db
- ./init-db/init.js:/docker-entrypoint-initdb.d/init.js
ports:
- '27017-27019:27017-27019'
mongo-express:
image: mongo-express
#restart: always
depends_on:
- mongo
- backend
ports:
- 8081:8081
environment:
ME_CONFIG_MONGODB_ADMINUSERNAME: rooer
ME_CONFIG_MONGODB_ADMINPASSWORD: 123123
redis:
image: redis:latest
command: ["redis-server", "--appendonly", "yes"]
hostname: redis
#ports:
# - "6379:6379"
volumes:
- cache-data:/data
backend:
build: ./app
ports:
- "5000:5000"
volumes:
- backend-data:/code
links:
- rdb
- redis
volumes:
rdb-data:
name: c-relational-data
ndb-data:
name: c-nosql-data
cache-data:
name: redis-data
backend-data:
name: backend-engine
2
Answers
It needs
runtime
options, but well, the runtime option is not available at compose file format 3. So there’s some optionsdocker run
with--runtime=nvidia
argumentAlso I recommend using image built by nvidia instead of
ubuntu:latest
For more information, you can read the issue here
I got the
cudaErrorNoDevice
defined bycudaError_t
, meanswhen I closed my laptop screen(not shutdown, but put down it’s lip),and docker, not exit, was working fine before.
This situation, restarting docker can fix it.