I am trying to build Tensorflow 2.15.0 with GPU support from source (Ubuntu 22.04). All of the documentation I have seen says that CUDA 12.2 should be used. But the build fails unless I have TensorRT installed. Fine – but TensorRT does not support CUDA 12.2 (I cannot even install TensorRT unless I have CUDA <= 12.1).
What am I missing here?
Details:
In order to compile from source I followed these steps:
- Install CUDA 12.2 (as per documentation/release notes) using standard NVIDIA instructions.
- Install cuDNN 8.8 (as per documentation/release notes) using standard NVIDIA instructions.
- Install clang 17 (as per documentation/release notes).
- Clone the tensorflow repository; checkout 2.15.0.
- I run the configure script as follows:
You have bazel 6.1.0 installed.
Please specify the location of python. [Default is /home/christopher/Desktop/code/tf-source/venv/bin/python3]:
Found possible Python library paths:
/home/christopher/Desktop/code/tf-source/venv/lib/python3.10/site-packages
Please input the desired Python library path to use. Default is [/home/christopher/Desktop/code/tf-source/venv/lib/python3.10/site-packages]
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.
Found CUDA 12.2 in:
/usr/local/cuda-12.2/targets/x86_64-linux/lib
/usr/local/cuda-12.2/targets/x86_64-linux/include
Found cuDNN 8 in:
/usr/lib/x86_64-linux-gnu
/usr/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 8.9]: 8.0
Do you want to use clang as CUDA compiler? [Y/n]:
Clang will be used as CUDA compiler.
Please specify clang path that to be used as host compiler. [Default is /usr/lib/llvm-17/bin/clang]:
You have Clang 17.0.6 installed.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=mkl_aarch64 # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
--config=monolithic # Config for mostly static monolithic build.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v1 # Build with TensorFlow 1 API instead of TF 2 API.
Preconfigured Bazel build configs to DISABLE default on features:
--config=nogcp # Disable GCP support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
When I compile using:
bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package
I see errors like this:
ERROR: /home/christopher/Desktop/code/tf-source/tensorflow/WORKSPACE:84:14: fetching tensorrt_configure rule //external:local_config_tensorrt: Traceback (most recent call last):
File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 300, column 38, in _tensorrt_configure_impl
_create_local_tensorrt_repository(repository_ctx)
File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 159, column 30, in _create_local_tensorrt_repository
config = find_cuda_config(repository_ctx, ["cuda", "tensorrt"])
File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/gpus/cuda_configure.bzl", line 649, column 26, in find_cuda_config
exec_result = execute(repository_ctx, [python_bin, repository_ctx.attr._find_cuda_config] + cuda_libraries)
File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
fail(
Error in fail: Repository command failed
Could not find any NvInferVersion.h matching version '' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
'local/cuda/extras/CUPTI/include'
'targets/x86_64-linux/include'
of:
'/lib'
'/lib/i386-linux-gnu'
'/lib/x86_64-linux-gnu'
'/lib32'
'/usr'
'/usr/lib/x86_64-linux-gnu/libfakeroot'
'/usr/lib32'
'/usr/local/cuda'
'/usr/local/cuda/targets/x86_64-linux/lib'
The missing headers belong to TensorRT I believe. So I try to install TensorRT using NVIDIA’s documentation. But CUDA 12.2 is not supported in the most recent release, only <= 12.1. Obviously, I have tried installing 12.1 and then I can get quite deep into the compilation; however the official release is built using CUDA 12.2, so I’m stumped at the moment.
2
Answers
The two libraries - libnvinfer-dev and libnvinfer-plugin-dev must be installed. For me, this was as follows:
They are installed alongside TensorRT, but can be installed independently.
Here is a docker file that sets an environment up which is capable of compiling 2.15 from source. Note the following:
You should set up the
TensorRT
install path, like this:By the way, you also need the
cuDNN
install path,The
bazel
build will find the header files ofTensorRT
via the system environment variables .