skip to Main Content

I’m training a Keras model, and I needed to switch devices to have more power(from Windows i3 core to Ubuntu i7). The problem is, my code works fine on my Windows, but shows the following error that stops the computation before even running the first epoch.
Here is the full output:

/home/willylutz/PycharmProjects/hiv_image_analysis/venv/bin/python /home/willylutz/PycharmProjects/hiv_image_analysis/main.py 
2022-09-19 09:36:52.801711: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-19 09:36:52.956260: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-19 09:36:53.502748: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/willylutz/PycharmProjects/hiv_image_analysis/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-09-19 09:36:53.502794: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/willylutz/PycharmProjects/hiv_image_analysis/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-09-19 09:36:53.502800: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
0
Found 480 files belonging to 2 classes.
Using 384 files for training.
2022-09-19 09:37:00.058171: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-09-19 09:37:00.058202: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (zhang): /proc/driver/nvidia/version does not exist
2022-09-19 09:37:00.067388: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Found 480 files belonging to 2 classes.
Using 96 files for validation.
['INF', 'NI']
2022-09-19 09:37:10.149236: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:390] Filling up shuffle buffer (this may take a while): 373 of 512
2022-09-19 09:37:10.197351: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:415] Shuffle buffer filled.
(64, 1024, 1024, 3)
(64,)
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 sequential (Sequential)     (None, 1024, 1024, 3)     0         
                                                                 
 rescaling (Rescaling)       (None, 1024, 1024, 3)     0         
                                                                 
 conv2d (Conv2D)             (None, 1024, 1024, 16)    448       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 512, 512, 16)     0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 512, 512, 32)      4640      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 256, 256, 32)     0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 256, 256, 64)      18496     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 128, 128, 64)     0         
 2D)                                                             
                                                                 
 dropout (Dropout)           (None, 128, 128, 64)      0         
                                                                 
 flatten (Flatten)           (None, 1048576)           0         
                                                                 
 dense (Dense)               (None, 128)               134217856 
                                                                 
 outputs (Dense)             (None, 2)                 258       
                                                                 
=================================================================
Total params: 134,241,698
Trainable params: 134,241,698
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
2022-09-19 09:37:16.377367: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 4294967296 exceeds 10% of free system memory.

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

If needed I can put my code too, but I feel it is not the problem.
Thanks for the help.

4

Answers


  1. There is a bug in keras/tensorflow 2.9 and 2.10, which causes preprocess layers like rescaling to be extremly slow:
    https://github.com/tensorflow/tensorflow/issues/56242

    Try the model without the rescaling layer. If you want to use this or similar preprocessing layers, you should use TF version 2.8.3 or older.

    I am not sure about the following, but

    Could not load dynamic library 'libnvinfer.so.7'
    

    seems to suggest there is also some other problem, maybe wrong version of tensorflow / keras / cuda.

    Login or Signup to reply.
  2. These two warnings:

    Could not load dynamic library 'libnvinfer.so.7'
    libnvinfer_plugin.so.7: cannot open shared object file
    

    Can be safely ignored.

    NVidia has a custom neural network framework called TensorRT.

    In short, TensorRT performs optimizations to the internal model graph, which means models are executed faster. Usually, you first save a tensorflow model to onnx, then convert the model from onnx to TensorRT. Recently, advancements from both TF and NVidia made it possible to take advantage of this to execute tensorflow models much faster when running in inference mode (when running on computers with a NVidia gpu).

    Previously, you were required to build TF from source to enable the TensorRT integration. It seems in the newest versions, it is now enabled by default (I suppose previous to downgrading to tf v2.8, you were on tf v2.11).

    However, for TF models to run using the TensorRT optimizations, you need to install certain drivers and libraries from NVidia on your PC. Because you do not have them installed, TF warns you it can’t find them.

    As stated at the top, you can safely ignore the warning, it will not affect in any way your training or TF performance.

    Login or Signup to reply.
  3. i ran into the same problem. I solved it by copying the source code from the keras Rescaling layer to my code and made my "own" Rescaling class. I changed math_ops.cast() to tf.cast() and it worked like a charm. No warnings and code was running much faster.

     class Rescaling(tf.keras.layers.Layer):
    """Multiply inputs by `scale` and adds `offset`.
    For instance:
    1. To rescale an input in the `[0, 255]` range
    to be in the `[0, 1]` range, you would pass `scale=1./255`.
    2. To rescale an input in the `[0, 255]` range to be in the `[-1, 1]` 
    range,
    you would pass `scale=1./127.5, offset=-1`.
    The rescaling is applied both during training and inference.
    Input shape:
    Arbitrary.
    Output shape:
    Same as input.
    Arguments:
    scale: Float, the scale to apply to the inputs.
    offset: Float, the offset to apply to the inputs.
    name: A string, the name of the layer.
    """
    
    def __init__(self, scale, offset=0., name=None, **kwargs):
      self.scale = scale
      self.offset = offset
      super(Rescaling, self).__init__(name=name, **kwargs)
    
    def call(self, inputs):
      dtype = self._compute_dtype
      scale = tf.cast(self.scale, dtype)
      offset = tf.cast(self.offset, dtype)
      return tf.cast(inputs, dtype) * scale + offset
    
    def compute_output_shape(self, input_shape):
      return input_shape
    
    def get_config(self):
      config = {
          'scale': self.scale,
          'offset': self.offset,
      }
      base_config = super(Rescaling, self).get_config()
      return dict(list(base_config.items()) + list(config.items()))
    
    Login or Signup to reply.
  4. Data augmentation is one of the reason for this problem.I comment Rotation option(layers.experimental.preprocessing.RandomRotation(0.2) and tried it agin, the warning disappeared.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search