skip to Main Content

I’ve trained a segmentation_models_pytorch.PSPNet model for image segmentation. For prediction I load whole image in PyTorch tensor and scan it with 384×384 pixels window.

result = model.predict(image_tensor[:, :, y:y+384, x:x+384])

My Windows machine has 6Gb GPU, while Ubuntu has 8 Gb GPU. When all models are loaded they consume some 1.4 Gb GPU. When processing a large image on Windows the memory consumption increases to 1.7 Gb GPU.

Under Windows the model can handle 25 M pixel images. Under Ubuntu the same code can only process up to 5 M pixel image. Debugging is difficult because I only have ssh access to the Ubuntu machine.
What could cause this discrepancy and how to debug this issue?

2

Answers


  1. You can check cuda version, torch version.

    To debug, you can just print image_tensor.shape before model.predict, maybe you are running a larger batch size on linux machine.

    Login or Signup to reply.
  2. First approach to solve:

    I should first check the libraries version of the CUDA, Pytorch, and other python libraries in general. If they are all equal, then check for compatibility of libraries in Ubunto and your GPU drive/CUDA.

    As commented by Jimmy, you can use a conda environment or a virtual environment directly on Python python3 -m venv env to control the libraries’ version.

    To debug

    You can install a Python IDE that allows you to debug. If not possible, one very simple but inefficient approach is to print on many parts of your code to see what exactly is different from your execution on Windows.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search