skip to Main Content

I am trying to build Tensorflow 2.15.0 with GPU support from source (Ubuntu 22.04). All of the documentation I have seen says that CUDA 12.2 should be used. But the build fails unless I have TensorRT installed. Fine – but TensorRT does not support CUDA 12.2 (I cannot even install TensorRT unless I have CUDA <= 12.1).

What am I missing here?

Details:

In order to compile from source I followed these steps:

  1. Install CUDA 12.2 (as per documentation/release notes) using standard NVIDIA instructions.
  2. Install cuDNN 8.8 (as per documentation/release notes) using standard NVIDIA instructions.
  3. Install clang 17 (as per documentation/release notes).
  4. Clone the tensorflow repository; checkout 2.15.0.
  5. I run the configure script as follows:
    You have bazel 6.1.0 installed.
    Please specify the location of python. [Default is /home/christopher/Desktop/code/tf-source/venv/bin/python3]: 
    
    
    Found possible Python library paths:
      /home/christopher/Desktop/code/tf-source/venv/lib/python3.10/site-packages
    Please input the desired Python library path to use.  Default is [/home/christopher/Desktop/code/tf-source/venv/lib/python3.10/site-packages]
    
    Do you wish to build TensorFlow with ROCm support? [y/N]: 
    No ROCm support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with CUDA support? [y/N]: y
    CUDA support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with TensorRT support? [y/N]: 
    No TensorRT support will be enabled for TensorFlow.
    
    Found CUDA 12.2 in:
        /usr/local/cuda-12.2/targets/x86_64-linux/lib
        /usr/local/cuda-12.2/targets/x86_64-linux/include
    Found cuDNN 8 in:
        /usr/lib/x86_64-linux-gnu
        /usr/include
    
    
    Please specify a list of comma-separated CUDA compute capabilities you want to build with.
    You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
    Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 8.9]: 8.0
    
    
    Do you want to use clang as CUDA compiler? [Y/n]: 
    Clang will be used as CUDA compiler.
    
    Please specify clang path that to be used as host compiler. [Default is /usr/lib/llvm-17/bin/clang]: 
    
    
    You have Clang 17.0.6 installed.
    
    Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: 
    
    
    Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
    Not configuring the WORKSPACE for Android builds.
    
    Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
        --config=monolithic     # Config for mostly static monolithic build.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
        --config=v1             # Build with TensorFlow 1 API instead of TF 2 API.
    Preconfigured Bazel build configs to DISABLE default on features:
        --config=nogcp          # Disable GCP support.
        --config=nonccl         # Disable NVIDIA NCCL support.
    Configuration finished

When I compile using:

bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package

I see errors like this:

ERROR: /home/christopher/Desktop/code/tf-source/tensorflow/WORKSPACE:84:14: fetching tensorrt_configure rule //external:local_config_tensorrt: Traceback (most recent call last):
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 300, column 38, in _tensorrt_configure_impl
        _create_local_tensorrt_repository(repository_ctx)
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 159, column 30, in _create_local_tensorrt_repository
        config = find_cuda_config(repository_ctx, ["cuda", "tensorrt"])
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/gpus/cuda_configure.bzl", line 649, column 26, in find_cuda_config
        exec_result = execute(repository_ctx, [python_bin, repository_ctx.attr._find_cuda_config] + cuda_libraries)
    File "/home/christopher/Desktop/code/tf-source/tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
        fail(
Error in fail: Repository command failed
Could not find any NvInferVersion.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
        'local/cuda/extras/CUPTI/include'
        'targets/x86_64-linux/include'
of:
        '/lib'
        '/lib/i386-linux-gnu'
        '/lib/x86_64-linux-gnu'
        '/lib32'
        '/usr'
        '/usr/lib/x86_64-linux-gnu/libfakeroot'
        '/usr/lib32'
        '/usr/local/cuda'
        '/usr/local/cuda/targets/x86_64-linux/lib'

The missing headers belong to TensorRT I believe. So I try to install TensorRT using NVIDIA’s documentation. But CUDA 12.2 is not supported in the most recent release, only <= 12.1. Obviously, I have tried installing 12.1 and then I can get quite deep into the compilation; however the official release is built using CUDA 12.2, so I’m stumped at the moment.

2

Answers


  1. Chosen as BEST ANSWER

    The two libraries - libnvinfer-dev and libnvinfer-plugin-dev must be installed. For me, this was as follows:

    sudo apt-get install -y libnvinfer-dev=8.6.1.6-1+cuda12.0 libnvinfer-plugin-dev=8.6.1.6-1+cuda12.0
    

    They are installed alongside TensorRT, but can be installed independently.

    Here is a docker file that sets an environment up which is capable of compiling 2.15 from source. Note the following:

    • The cudnn .deb file must be downloaded manually and placed in he docker build directory.
    • Once built, cd into the build directory, pull the latest code and checkout the v2.15.0 branch.
    • Run the configure script (do not use clang as there is a known issue with building 2.15.0 with clang; use nvcc)
    FROM ubuntu:22.04
    
    ENV DEBIAN_FRONTEND=noninteractive
    
    WORKDIR /downloads
    
    RUN apt-get update && 
        apt-get install -y --no-install-recommends wget ca-certificates git lsb-release software-properties-common gnupg && 
        rm -rf /var/lib/apt/lists/*
    
    # Install Bazelisk.
    RUN wget https://github.com/bazelbuild/bazelisk/releases/download/v1.19.0/bazelisk-linux-amd64 -O /usr/local/bin/bazel && 
        chmod +x /usr/local/bin/bazel
    
    # Install LLVM/Clang 16
    RUN wget https://apt.llvm.org/llvm.sh && 
        chmod +x llvm.sh && 
        ./llvm.sh 16 && 
        rm llvm.sh
    
    # Install CUDA Toolkit 12.2.
    RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && 
        mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && 
        wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb && 
        dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb && 
        cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/ && 
        apt-get update && 
        apt-get -y install cuda
    
    # Install cuDnn. This .deb file is fetched manually from the NVIDIA archive (TODO - there is a way to get around the required authorization and use wget).
    COPY cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb /downloads/
    RUN dpkg -i cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb && 
        cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-*-keyring.gpg /usr/share/keyrings/ && 
        apt-get update && 
        apt-get install libcudnn8=8.8.1.3-1+cuda12.0 && 
        apt-get install libcudnn8-dev=8.8.1.3-1+cuda12.0 && 
        apt-get install libcudnn8-samples=8.8.1.3-1+cuda12.0
    
    # Fetch the tensorflow source code.
    RUN git clone https://github.com/tensorflow/tensorflow.git
    
    # Install nvinfer dependencies.
    RUN apt-get install -y --no-install-recommends gnupg2 curl ca-certificates && 
        curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-archive-keyring.gpg -o /usr/share/keyrings/cuda-archive-keyring.gpg && 
        echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" > /etc/apt/sources.list.d/cuda.list && 
        apt-get purge --autoremove -y curl && 
        rm -rf /var/lib/apt/lists/* && 
        apt-get update && 
        apt-get install -y --no-install-recommends libnvinfer-dev=8.6.1.6-1+cuda12.0 libnvinfer-plugin-dev=8.6.1.6-1+cuda12.0 && 
        apt-get clean && 
        rm -rf /var/lib/apt/lists/*
    
    RUN apt-get update && apt-get -y install libstdc++-12-dev 
    
    RUN apt-get install python-is-python3 python3-pip python3-dev patchelf
    
    CMD ["/bin/bash"]
    

  2. You should set up the TensorRT install path, like this:

    export TENSORRT_INSTALL_PATH=<Your/TensorRT/install/path>
    

    By the way, you also need the cuDNN install path,

    export CUDNN_INSTALL_PATH=<Your/cuDNN/install/path>
    

    The bazel build will find the header files of TensorRT via the system environment variables .

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search