I am using pytorch geometric to train a graph neural network. The problem that led to this question is the following error:
RuntimeError: Expected all tensors to be on the same device, but found
at least two devices, cpu and cuda:0! (when checking argument for
argument mat1 in method wrapper_addmm)
So, I am trying to check which device the tensors are loaded on, and when I run data.x.get_device()
and data.edge_index.get_device()
, I get -1
for each. What does -1 mean?
In general, I am a bit confused as to when I need to transfer data to the device (whether cpu or gpu), but I assume for each epoch, I simply use .to(device) on my tensors to add to the proper device (but as of now I am not using .to(device) since I am just testing with cpu).
Additional context:
I am running ubuntu 20, and I didn’t see this issue until installing cuda (i.e., I was able to train/test the model on cpu but only having this issue after installing cuda and updating nvidia drivers).
I have cuda 11.7
installed on my system with an nvidia driver compatible up to cuda 12 (e.g., cuda 12 is listed with nvidia-smi
), and the output of torch.version.cuda
is 11.7
. Regardless, I am simply trying to use the cpu at the moment, but will use the gpu once this device issue is resolved.
2
Answers
It means the data is still on the CPU
-1 indicates not on any GPU device.
To move it to the GPU, use
.to()
-1 means the tensors are on CPU. when you do
.to(device)
, what is yourdevice
variable initialized as? If you want to use only CPU, I suggest initializing device asdevice = torch.device("cpu")
and running your python code withCUDA_VISIBLE_DEVICES='' python your_code.py ...
Typically, if you are passing your tensors to a model, PyTorch expects your tensors to be on the same device as your model. If you are passing multiple tensors to a method, such as your loss function “nn.CrossEntropy()“`, PyTorch expects both your tensors (predictions and labels) to be on the same device.