I’m trying to train a model with Yolov8. Everything was good but today I suddenly notice getting this warning apparently related to PyTorch
and cuDNN
. In spite the warning, the training seems to be progressing though. I’m not sure if it has any negative effects on the training progress.
site-packages/torch/autograd/graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
What is the problem and how to address this?
Here is the output of collect_env
:
Collecting environment information...
PyTorch version: 2.3.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: glibc-2.31
Python version: 3.9.7 | packaged by conda-forge | (default, Sep 2 2021, 17:58:34) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100 80GB PCIe
Nvidia driver version: 515.105.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.8.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] onnx==1.16.0
[pip3] onnxruntime==1.17.3
[pip3] onnxruntime-gpu==1.17.1
[pip3] onnxsim==0.4.36
[pip3] optree==0.11.0
[pip3] torch==2.3.0+cu118
[pip3] torchaudio==2.3.0+cu118
[pip3] torchvision==0.18.0+cu118
[pip3] triton==2.3.0
[conda] numpy 1.24.4 pypi_0 pypi
[conda] pytorch-quantization 2.2.1 pypi_0 pypi
[conda] torch 2.1.1+cu118 pypi_0 pypi
[conda] torchaudio 2.1.1+cu118 pypi_0 pypi
[conda] torchmetrics 0.8.0 pypi_0 pypi
[conda] torchvision 0.16.1+cu118 pypi_0 pypi
[conda] triton 2.1.0 pypi_0 pypi
2
Answers
June 2024 Solution: Upgrade torch version to 2.3.1 to fix it:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
In version 2.3.0 of pytorch, it prints this unwanted warning even if no exception is thrown: see https://github.com/pytorch/pytorch/pull/125790
As you mentioned, though, the training is processing correctly. If you want to get rid of this warning, you should revert to torch 2.2.2 (you then also have to revert torchvision to 0.17.2):