For some reason, any docker container with CUDA cannot see my GPU.
When I run this:
docker run --gpus=all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
I have this output:
...
Error: only 0 Devices available, 1 requested. Exiting.
The CUDA container is unable to find my GPU.
I’ve found plenty of similar issues in forums but with no satisfactory answer.
Has any of you found the reason this happens with WSl2 / Docker Desktop / Win10 / Ubuntu20.04?
I have the latest versions of drivers for CUDA & NVIDIA and the latest version of WSL2 & Docker-Desktop.
But nvidia-smi and nvcc –version both work
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti On | 00000000:01:00.0 On | N/A |
| 0% 53C P8 16W / 165W | 1045MiB / 16380MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 21 G /Xwayland N/A |
| 0 N/A N/A 23 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
This also works -> it seems purely CUDA related.
/mnt/c/Users/pavel$ docker run --rm --gpus=all ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4060 Ti (UUID: GPU-28820d91-b332-b4ba-f1c8-5508048ce1f7)
My enviroment:
wsl --version
Verze WSL: 2.1.5.0
Verze jádra: 5.15.146.1-2
Verze WSLg: 1.0.60
Verze MSRDC: 1.2.5105
Verze Direct3D: 1.611.1-81528511
Verze DXCore: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Verze Windows: 10.0.19045.4412
docker info
Client:
Version: 26.1.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.14.0-desktop.1
Path: C:Program FilesDockercli-pluginsdocker-buildx.exe
compose: Docker Compose (Docker Inc.)
Version: v2.27.0-desktop.2
Path: C:Program FilesDockercli-pluginsdocker-compose.exe
debug: Get a shell into any image or container (Docker Inc.)
Version: 0.0.29
Path: C:Program FilesDockercli-pluginsdocker-debug.exe
dev: Docker Dev Environments (Docker Inc.)
Version: v0.1.2
Path: C:Program FilesDockercli-pluginsdocker-dev.exe
extension: Manages Docker extensions (Docker Inc.)
Version: v0.2.23
Path: C:Program FilesDockercli-pluginsdocker-extension.exe
feedback: Provide feedback, right in your terminal! (Docker Inc.)
Version: v1.0.4
Path: C:Program FilesDockercli-pluginsdocker-feedback.exe
init: Creates Docker-related starter files for your project (Docker Inc.)
Version: v1.1.0
Path: C:Program FilesDockercli-pluginsdocker-init.exe
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
Version: 0.6.0
Path: C:Program FilesDockercli-pluginsdocker-sbom.exe
scout: Docker Scout (Docker Inc.)
Version: v1.8.0
Path: C:Program FilesDockercli-pluginsdocker-scout.exe
Server:
Containers: 11
Running: 6
Paused: 0
Stopped: 5
Images: 47
Server Version: 26.1.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e377cd56a71523140ca6ae87e30244719194a521
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: unconfined
Kernel Version: 5.15.146.1-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 5
Total Memory: 39.18GiB
Name: docker-desktop
ID: 88425de8-c396-4a90-9fea-afb64822deaa
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Labels:
com.docker.desktop.address=npipe://\.pipedocker_cli
Experimental: false
Insecure Registries:
hubproxy.docker.internal:5555
127.0.0.0/8
Live Restore Enabled: false
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
WARNING: daemon is not using the default seccomp profile
nvidia-smi
Fri May 31 08:47:59 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti WDDM | 00000000:01:00.0 On | N/A |
| 0% 54C P8 16W / 165W | 1083MiB / 16380MiB | 2% Default |
| | | N/A |
Ubuntu 20.04
I’ve already tried multiple installations and clean installations of my NVIDIA drivers, CUDA Toolkit, NVIDIA Container Toolkit etc.
From what I’ve found there is a big difference in what people have to install into their Win10/WSL2 environment to get CUDA working. Some install the latest NVIDIA driver. Some install both CUDA Win10 Toolkit and CUDA WSL-Ubuntu Toolkit. Also, some people had to install Nvidia Container Toolkit and some did not have to.
I got into an infinite circle of trying all possible combinations of installations, but it seems like I’m missing something.
Has anyone faced the same and found a solution? Thank you!!
2
Answers
if anyone faces this issue with the driver version 555.85 (CUDA 12.5) the solution is to downgrade to 552.22 (CUDA 12.4) ( https://www.nvidia.com/download/driverResults.aspx/224154/en-us/ )
If you only need to run CUDA containers Steps - delete all nvidia and cuda packages from wsl Uninstall CUDA toolkit in windows.
Download the 552.22 driver with 12.4 CUDA inside.
Run clean installation.
Reboot
docker run --gpus all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Should work now :)
I had the same issue, and downgrading the driver version didn’t help (currently 556.12).
Instead I found this comment on GitHub, saying to update
no-cgroups
tofalse
in/etc/nvidia-container-runtime/config.toml
.Then rebooted docker and works fine.