skip to Main Content

For some reason, any docker container with CUDA cannot see my GPU.

When I run this:
docker run --gpus=all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
I have this output:

...
Error: only 0 Devices available, 1 requested.  Exiting.

The CUDA container is unable to find my GPU.

I’ve found plenty of similar issues in forums but with no satisfactory answer.
Has any of you found the reason this happens with WSl2 / Docker Desktop / Win10 / Ubuntu20.04?
I have the latest versions of drivers for CUDA & NVIDIA and the latest version of WSL2 & Docker-Desktop.

But nvidia-smi and nvcc –version both work

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   53C    P8             16W /  165W |    1045MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        21      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        23      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

This also works -> it seems purely CUDA related.

/mnt/c/Users/pavel$ docker run --rm  --gpus=all ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4060 Ti (UUID: GPU-28820d91-b332-b4ba-f1c8-5508048ce1f7)

My enviroment:

wsl --version
Verze WSL: 2.1.5.0
Verze jádra: 5.15.146.1-2
Verze WSLg: 1.0.60
Verze MSRDC: 1.2.5105
Verze Direct3D: 1.611.1-81528511
Verze DXCore: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Verze Windows: 10.0.19045.4412

docker info

Client:
 Version:    26.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     C:Program FilesDockercli-pluginsdocker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     C:Program FilesDockercli-pluginsdocker-compose.exe
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     C:Program FilesDockercli-pluginsdocker-debug.exe
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     C:Program FilesDockercli-pluginsdocker-dev.exe
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     C:Program FilesDockercli-pluginsdocker-extension.exe
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     C:Program FilesDockercli-pluginsdocker-feedback.exe
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     C:Program FilesDockercli-pluginsdocker-init.exe
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     C:Program FilesDockercli-pluginsdocker-sbom.exe
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     C:Program FilesDockercli-pluginsdocker-scout.exe

Server:
 Containers: 11
  Running: 6
  Paused: 0
  Stopped: 5
 Images: 47
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
 Kernel Version: 5.15.146.1-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 5
 Total Memory: 39.18GiB
 Name: docker-desktop
 ID: 88425de8-c396-4a90-9fea-afb64822deaa
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=npipe://\.pipedocker_cli
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
WARNING: daemon is not using the default seccomp profile
 nvidia-smi
Fri May 31 08:47:59 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85                 Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   54C    P8             16W /  165W |    1083MiB /  16380MiB |      2%      Default |
|                                         |                        |                  N/A |

Ubuntu 20.04

I’ve already tried multiple installations and clean installations of my NVIDIA drivers, CUDA Toolkit, NVIDIA Container Toolkit etc.

From what I’ve found there is a big difference in what people have to install into their Win10/WSL2 environment to get CUDA working. Some install the latest NVIDIA driver. Some install both CUDA Win10 Toolkit and CUDA WSL-Ubuntu Toolkit. Also, some people had to install Nvidia Container Toolkit and some did not have to.

I got into an infinite circle of trying all possible combinations of installations, but it seems like I’m missing something.

Has anyone faced the same and found a solution? Thank you!!

2

Answers


  1. Chosen as BEST ANSWER

    if anyone faces this issue with the driver version 555.85 (CUDA 12.5) the solution is to downgrade to 552.22 (CUDA 12.4) ( https://www.nvidia.com/download/driverResults.aspx/224154/en-us/ )

    If you only need to run CUDA containers Steps - delete all nvidia and cuda packages from wsl Uninstall CUDA toolkit in windows.

    Download the 552.22 driver with 12.4 CUDA inside.

    Run clean installation.

    Reboot

    docker run --gpus all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

    Should work now :)


  2. I had the same issue, and downgrading the driver version didn’t help (currently 556.12).

    Instead I found this comment on GitHub, saying to update no-cgroups to false in /etc/nvidia-container-runtime/config.toml.

    sudo sed -i 's/no-cgroups = true/no-cgroups = false/' /etc/nvidia-container-runtime/config.toml
    

    Then rebooted docker and works fine.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search