I’ve trained a segmentation_models_pytorch.PSPNet
model for image segmentation. For prediction I load whole image in PyTorch tensor
and scan it with 384×384 pixels window.
result = model.predict(image_tensor[:, :, y:y+384, x:x+384])
My Windows machine has 6Gb GPU, while Ubuntu has 8 Gb GPU. When all models are loaded they consume some 1.4 Gb GPU. When processing a large image on Windows the memory consumption increases to 1.7 Gb GPU.
Under Windows the model can handle 25 M pixel images. Under Ubuntu the same code can only process up to 5 M pixel image. Debugging is difficult because I only have ssh access to the Ubuntu machine.
What could cause this discrepancy and how to debug this issue?
2
Answers
You can check cuda version, torch version.
To debug, you can just print image_tensor.shape before model.predict, maybe you are running a larger batch size on linux machine.
First approach to solve:
I should first check the libraries version of the CUDA, Pytorch, and other python libraries in general. If they are all equal, then check for compatibility of libraries in Ubunto and your GPU drive/CUDA.
As commented by Jimmy, you can use a conda environment or a virtual environment directly on Python
python3 -m venv env
to control the libraries’ version.To debug
You can install a Python IDE that allows you to debug. If not possible, one very simple but inefficient approach is to print on many parts of your code to see what exactly is different from your execution on Windows.