Does Ubuntu TensorFlow detect GPU on a cluster?

John
May 26, 2023
345 views
0 votes
2 Answers

I’m using a cluster to train my Machine Learning Model (TensorFlow) in Jupyter Notebook. The cluster already has JupyterHub (Python 3.7.5), CUDA, and cuDNN installed before I started using it. The cluster is running on Ubuntu 18.04 with GCC Version 8.4.0. When I execute the nvidia-smi command, I get the following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
(...)
|===============================+======================+======================|
|   0  Quadro P4000        On   | 00000000:00:05.0 Off |                  N/A |
(...)

I am not the system administrator, so I installed TensorFlow-GPU using pip. However, when I train the model, Jupyter Notebook and TensorFlow do not detect any GPU, as can be seen below:

Code:

import tensorflow as tf

print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))

Output:

2.8.3
[]

I hope that you can help me.

Here are the steps I have already taken:

Reinstalled tensorflow packages;
Installed a different version of Tensorflow;

Answers

Chosen as BEST ANSWER
- John
- May 26, 2023 at 3:20 am
- 0 votes
0
Thanks for the answer. I solved the problem by following the steps below.
1. I used the ls /usr/local command to check the installed version of the CUDA Toolkit on the system.
2. I made changes to the PATH file in .bashrc as follows:
```
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}$
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
```
1. I ran source .bashrc to apply the changes.
2. Now the nvcc -V command is working and showing the correct version for TensorFlow:
```
Cuda compilation tools, release 10.1, V10.1.243  
```
1. pip uninstall tensorflow-gpu and pip uninstall tensorflow-estimator
2. pip install tensorflow-gpu==2.3.0 (Compatible with CUDA 10.1 according this link)
Now, when I run the code below:
```
import tensorflow as tf

print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))
```
The output is:
```
2.3.0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
```
Thank you!

(Edit)

- LeonRanke
- May 25, 2023 at 5:52 pm
- 0 votes
0
The TensorFlow documentation lists the following specifications:

Version Python version Compiler Build tools cuDNN CUDA

tensorflow-2.9.0 3.7-3.10 GCC 9.3.1 Bazel 5.0.0 8.1 11.2

tensorflow-2.8.0 3.7-3.10 GCC 7.3.1 Bazel 4.2.1 8.1 11.2

With TensorFlow version 2.8.0; GCC 7.3.1 is listed. This could be the reason why TensorFlow does not detect the GPU. I had a similar issue where I used a slightly different version than the one specified and my GPU also did not show up.

Login or Signup to reply.

Version	Python version	Compiler	Build tools	cuDNN	CUDA
tensorflow-2.9.0	3.7-3.10	GCC 9.3.1	Bazel 5.0.0	8.1	11.2
tensorflow-2.8.0	3.7-3.10	GCC 7.3.1	Bazel 4.2.1	8.1	11.2

Please signup or login to give your own answer.

Click here to cancel reply.