Cuda 12 + tf-nightly 2.12: Could not find cuda drivers on your machine, GPU will not be used, while every checking is fine and in torch it works - Ubuntu

JaimeCorton
March 2, 2023
123 views
3 votes
3 Answers

tf-nightly version = 2.12.0-dev2023203
Python version = 3.10.6
CUDA drivers version = 525.85.12
CUDA version = 12.0
Cudnn version = 8.5.0
I am using Linux (x86_64, Ubuntu 22.04)
I am coding in Visual Studio Code on a venv virtual environment

I am trying to run some models on the GPU (NVIDIA GeForce RTX 3050) using tensorflow nightly 2.12 (to be able to use Cuda 12.0). The problem that I have is that apparently every checking that I am making seems to be correct, but in the end the script is not able to detect the GPU. I’ve dedicated a lot of time trying to see what is happening and nothing seems to work, so any advice or solution will be more than welcomed. The GPU seems to be working for torch as you can see at the very end of the question.

I will show some of the most common checkings regarding CUDA that I did (runned from Visual Studio Code terminal), I hope you find it useful:

Check CUDA version:

$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

Check if the conection with the CUDA libraries is correct:

$ echo $LD_LIBRARY_PATH
```
/usr/cuda/lib
```

Check nvidia drivers for the GPU and check if GPU is readable for the venv:

$ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   40C    P5     6W /  20W |     46MiB /  4096MiB |     22%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1356      G   /usr/lib/xorg/Xorg                 45MiB |
+-----------------------------------------------------------------------------+

Add cuda/bin PATH and Check it:

$ export PATH="/usr/local/cuda/bin:$PATH"

$ echo $PATH

/usr/local/cuda-12.0/bin:/home/victus-linux/Escritorio/MasterThesis_CODE/to_share/venv_master/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin

Custom function to check if CUDA is correctly installed: [function by Sherlock]

function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcuda
check libcudart

libcudart.so.12 -> libcudart.so.12.0.146
        libcuda.so.1 -> libcuda.so.525.85.12
        libcuda.so.1 -> libcuda.so.525.85.12
        libcudadebugger.so.1 -> libcudadebugger.so.525.85.12
libcuda is installed
        libcudart.so.12 -> libcudart.so.12.0.146
libcudart is installed

Custom function to check if Cudnn is correctly installed: [function by Sherlock]

function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; }
function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; }
check libcudnn

        libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.8.0
        libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.8.0
        libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.8.0
        libcudnn.so.8 -> libcudnn.so.8.8.0
        libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.8.0
        libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.8.0
        libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.8.0
libcudnn is installed

So, once I did this previous checkings I used a script to evaluate if everything was finally ok and then the following error appeared:

import tensorflow as tf

print(f'nTensorflow version = {tf.__version__}n')
print(f'n{tf.config.list_physical_devices("GPU")}n')

2023-03-02 12:05:09.463343: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-02 12:05:09.489911: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-02 12:05:09.490522: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-02 12:05:10.066759: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Tensorflow version = 2.12.0-dev20230203

2023-03-02 12:05:10.748675: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-03-02 12:05:10.771263: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

[]

Extra check: I tried to run a checking script on torch and in here it worked so I guess the problem is related with tensorflow/tf-nightly

import torch

print(f'nAvailable cuda = {torch.cuda.is_available()}')

print(f'nGPUs availables = {torch.cuda.device_count()}')

print(f'nCurrent device = {torch.cuda.current_device()}')

print(f'nCurrent Device location = {torch.cuda.device(0)}')

print(f'nName of the device = {torch.cuda.get_device_name(0)}')

Available cuda = True

GPUs availables = 1

Current device = 0

Current Device location = <torch.cuda.device object at 0x7fbe26fd2ec0>

Name of the device = NVIDIA GeForce RTX 3050 Laptop GPU

Please, if you now something that might help solve this issue, don’t hesitate on telling me.

Answers

- user21343408
- March 6, 2023 at 4:41 pm
- 0 votes
0
"I experienced the same thing, and it can be resolved by installing TensorFlowRT."
1. pip3 install nvidia-tensorrt
2. check the libnvinfer.* file link once again, and make sure that the LD_LIBRARY_PATH points to the installation directory."
3. refer: Could not load dynamic library 'libnvinfer.so.7'
After all the libraries are fixed, then the GPU output will be visible.
GPU visible:
Login or Signup to reply.

- arivero
- March 16, 2023 at 11:21 pm
- 0 votes
0
I think that, as of March 2023, the only tensorflow distribution for cuda 12 is the docker package from NVIDIA.

A tf package for cuda 12 should show the following info
```
>>> tf.sysconfig.get_build_info() 
OrderedDict([('cpu_compiler', '/usr/bin/x86_64-linux-gnu-gcc-11'), 
('cuda_compute_capabilities', ['compute_86']), 
('cuda_version', '12.0'), ('cudnn_version', '8'), 
('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])
```
But if we run tf.sysconfig.get_build_info() on any tensorflow package installed via pip, it stills tells that cuda_version is 11.x

So your alternatives are:
- install docker with the nvidia cloud instructions and run one of the
  recent containers
- compile tensorflow from source, either nightly or last release. Caveat, it takes a lot of RAM and some time, as all good compilations do, and the occasional error to be corrected on the run. In my case, to define kFP8, the new 8-bits float.
- wait
Login or Signup to reply.

- KenYN
- April 21, 2023 at 8:54 am
- 0 votes
0
Just to add as another alternative, the official pip3 install tensorflow also doesn’t work with CUDA 12, so my solution was to just go back to CUDA 11:
```
sudo apt install cuda-11-8
```
Tensorflow now works.
Login or Signup to reply.