I have created the VM using GCP Console in browser.
While creating VM, I selected the VM Image as "c2-deeplearning-pytorch-1-8-cu110-v20210619-debian-10". Also, I selected GPU as T4.
VM gets created and started and it shows green icon in browser.
Then I try to connect from "gcloud compute ssh " and it asks if I want to install nVidia Driver and I do Y, then it gives error for lock file and driver is not installed as:
This VM requires Nvidia drivers to function correctly. Installation
takes ~1 minute. Would you like to install the Nvidia driver? [y/n] y
Installing Nvidia driver. install linux headers:
linux-headers-4.19.0-16-cloud-amd64 E: dpkg was interrupted, you must
manually run ‘sudo dpkg –configure -a’ to correct the problem.
Nvidia driver installed.
I try to verify if driver is installed by running python code as:
import torch
torch.cuda.is_available() #returns False.
Anybody else faced this issue?
3
Answers
Solution to my problem was:
It works then.
This is the correct way to install NVIDIA driver on a GCP instance:
Reboot
Adjust your config accordingly as it pops options in the terminal
Reboot
Make sure you are running as root. I know this sounds silly, but if you use their notebook instances the default user is not root and if you try to ssh into the instance and run something like
gpustat
etc or run custom code, you might get errors like NVIDIA drivers are not loaded or such.If you make sure your user (which is called jupyter in the default case) is in the sudoers then all will work fine.
It is often very complicated to install or reinstall GPU drivers on GCP instances. Make sure you actually need to reinstall before you attempt other solutions.