skip to Main Content

I’m using a cluster to train my Machine Learning Model (TensorFlow) in Jupyter Notebook. The cluster already has JupyterHub (Python 3.7.5), CUDA, and cuDNN installed before I started using it. The cluster is running on Ubuntu 18.04 with GCC Version 8.4.0. When I execute the nvidia-smi command, I get the following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
(...)
|===============================+======================+======================|
|   0  Quadro P4000        On   | 00000000:00:05.0 Off |                  N/A |
(...)

I am not the system administrator, so I installed TensorFlow-GPU using pip. However, when I train the model, Jupyter Notebook and TensorFlow do not detect any GPU, as can be seen below:

Code:

import tensorflow as tf

print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))

Output:

2.8.3
[]

I hope that you can help me.

Here are the steps I have already taken:

  • Reinstalled tensorflow packages;
  • Installed a different version of Tensorflow;

2

Answers


  1. Chosen as BEST ANSWER

    Thanks for the answer. I solved the problem by following the steps below.

    1. I used the ls /usr/local command to check the installed version of the CUDA Toolkit on the system.

    2. I made changes to the PATH file in .bashrc as follows:

    export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}$
    export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    
    1. I ran source .bashrc to apply the changes.

    2. Now the nvcc -V command is working and showing the correct version for TensorFlow:

    Cuda compilation tools, release 10.1, V10.1.243  
    
    1. pip uninstall tensorflow-gpu and pip uninstall tensorflow-estimator

    2. pip install tensorflow-gpu==2.3.0 (Compatible with CUDA 10.1 according this link)

    Now, when I run the code below:

    import tensorflow as tf
    
    print(tf.__version__)
    print(tf.config.list_physical_devices('GPU'))
    

    The output is:

    2.3.0
    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
    

    Thank you!


  2. The TensorFlow documentation lists the following specifications:

    Version Python version Compiler Build tools cuDNN CUDA
    tensorflow-2.9.0 3.7-3.10 GCC 9.3.1 Bazel 5.0.0 8.1 11.2
    tensorflow-2.8.0 3.7-3.10 GCC 7.3.1 Bazel 4.2.1 8.1 11.2

    With TensorFlow version 2.8.0; GCC 7.3.1 is listed. This could be the reason why TensorFlow does not detect the GPU. I had a similar issue where I used a slightly different version than the one specified and my GPU also did not show up.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search