- tf-nightly version = 2.12.0-dev2023203
- Python version = 3.10.6
- CUDA drivers version = 525.85.12
- CUDA version = 12.0
- Cudnn version = 8.5.0
- I am using Linux (x86_64, Ubuntu 22.04)
- I am coding in Visual Studio Code on a venv virtual environment
I am trying to run some models on the GPU (NVIDIA GeForce RTX 3050) using tensorflow nightly 2.12 (to be able to use Cuda 12.0). The problem that I have is that apparently every checking that I am making seems to be correct, but in the end the script is not able to detect the GPU. I’ve dedicated a lot of time trying to see what is happening and nothing seems to work, so any advice or solution will be more than welcomed. The GPU seems to be working for torch as you can see at the very end of the question.
I will show some of the most common checkings regarding CUDA that I did (runned from Visual Studio Code terminal), I hope you find it useful:
-
Check CUDA version:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Fri_Jan__6_16:45:21_PST_2023 Cuda compilation tools, release 12.0, V12.0.140 Build cuda_12.0.r12.0/compiler.32267302_0
-
Check if the conection with the CUDA libraries is correct:
$ echo $LD_LIBRARY_PATH
/usr/cuda/lib
-
Check nvidia drivers for the GPU and check if GPU is readable for the venv:
$ nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A | | N/A 40C P5 6W / 20W | 46MiB / 4096MiB | 22% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1356 G /usr/lib/xorg/Xorg 45MiB | +-----------------------------------------------------------------------------+
-
Add cuda/bin PATH and Check it:
$ export PATH="/usr/local/cuda/bin:$PATH"
$ echo $PATH
/usr/local/cuda-12.0/bin:/home/victus-linux/Escritorio/MasterThesis_CODE/to_share/venv_master/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
-
Custom function to check if CUDA is correctly installed: [function by Sherlock]
function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; } function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; } check libcuda check libcudart
libcudart.so.12 -> libcudart.so.12.0.146 libcuda.so.1 -> libcuda.so.525.85.12 libcuda.so.1 -> libcuda.so.525.85.12 libcudadebugger.so.1 -> libcudadebugger.so.525.85.12 libcuda is installed libcudart.so.12 -> libcudart.so.12.0.146 libcudart is installed
-
Custom function to check if Cudnn is correctly installed: [function by Sherlock]
function lib_installed() { /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep $1; } function check() { lib_installed $1 && echo "$1 is installed" || echo "ERROR: $1 is NOT installed"; } check libcudnn
libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.8.0 libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.8.0 libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.8.0 libcudnn.so.8 -> libcudnn.so.8.8.0 libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.8.0 libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.8.0 libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.8.0 libcudnn is installed
So, once I did this previous checkings I used a script to evaluate if everything was finally ok and then the following error appeared:
import tensorflow as tf
print(f'nTensorflow version = {tf.__version__}n')
print(f'n{tf.config.list_physical_devices("GPU")}n')
2023-03-02 12:05:09.463343: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-02 12:05:09.489911: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-02 12:05:09.490522: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-02 12:05:10.066759: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Tensorflow version = 2.12.0-dev20230203
2023-03-02 12:05:10.748675: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-03-02 12:05:10.771263: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
Extra check: I tried to run a checking script on torch and in here it worked so I guess the problem is related with tensorflow/tf-nightly
import torch
print(f'nAvailable cuda = {torch.cuda.is_available()}')
print(f'nGPUs availables = {torch.cuda.device_count()}')
print(f'nCurrent device = {torch.cuda.current_device()}')
print(f'nCurrent Device location = {torch.cuda.device(0)}')
print(f'nName of the device = {torch.cuda.get_device_name(0)}')
Available cuda = True
GPUs availables = 1
Current device = 0
Current Device location = <torch.cuda.device object at 0x7fbe26fd2ec0>
Name of the device = NVIDIA GeForce RTX 3050 Laptop GPU
Please, if you now something that might help solve this issue, don’t hesitate on telling me.
3
Answers
"I experienced the same thing, and it can be resolved by installing TensorFlowRT."
After all the libraries are fixed, then the GPU output will be visible.
GPU visible:
I think that, as of March 2023, the only tensorflow distribution for cuda 12 is the docker package from NVIDIA.
A tf package for cuda 12 should show the following info
But if we run tf.sysconfig.get_build_info() on any tensorflow package installed via pip, it stills tells that cuda_version is 11.x
So your alternatives are:
recent containers
Just to add as another alternative, the official
pip3 install tensorflow
also doesn’t work with CUDA 12, so my solution was to just go back to CUDA 11:Tensorflow now works.