I’m using a cluster to train my Machine Learning Model (TensorFlow) in Jupyter Notebook. The cluster already has JupyterHub (Python 3.7.5), CUDA, and cuDNN installed before I started using it. The cluster is running on Ubuntu 18.04 with GCC Version 8.4.0. When I execute the nvidia-smi command, I get the following output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
(...)
|===============================+======================+======================|
| 0 Quadro P4000 On | 00000000:00:05.0 Off | N/A |
(...)
I am not the system administrator, so I installed TensorFlow-GPU using pip. However, when I train the model, Jupyter Notebook and TensorFlow do not detect any GPU, as can be seen below:
Code:
import tensorflow as tf
print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))
Output:
2.8.3
[]
I hope that you can help me.
Here are the steps I have already taken:
- Reinstalled tensorflow packages;
- Installed a different version of Tensorflow;
2
Answers
Thanks for the answer. I solved the problem by following the steps below.
I used the
ls /usr/local
command to check the installed version of the CUDA Toolkit on the system.I made changes to the PATH file in
.bashrc
as follows:I ran source
.bashrc
to apply the changes.Now the
nvcc -V
command is working and showing the correct version for TensorFlow:pip uninstall tensorflow-gpu
andpip uninstall tensorflow-estimator
pip install tensorflow-gpu==2.3.0
(Compatible with CUDA 10.1 according this link)Now, when I run the code below:
The output is:
Thank you!
The TensorFlow documentation lists the following specifications:
With TensorFlow version 2.8.0; GCC 7.3.1 is listed. This could be the reason why TensorFlow does not detect the GPU. I had a similar issue where I used a slightly different version than the one specified and my GPU also did not show up.