GCP VM not installing nVidia driver properly - Debian

sandeepsign
June 24, 2021
222 views
3 votes
3 Answers

I have created the VM using GCP Console in browser.

While creating VM, I selected the VM Image as "c2-deeplearning-pytorch-1-8-cu110-v20210619-debian-10". Also, I selected GPU as T4.

VM gets created and started and it shows green icon in browser.

Then I try to connect from "gcloud compute ssh " and it asks if I want to install nVidia Driver and I do Y, then it gives error for lock file and driver is not installed as:

This VM requires Nvidia drivers to function correctly. Installation
takes ~1 minute. Would you like to install the Nvidia driver? [y/n] y
Installing Nvidia driver. install linux headers:
linux-headers-4.19.0-16-cloud-amd64 E: dpkg was interrupted, you must
manually run ‘sudo dpkg –configure -a’ to correct the problem.
Nvidia driver installed.

I try to verify if driver is installed by running python code as:

import torch
torch.cuda.is_available() #returns False.

Anybody else faced this issue?

Tags: google-cloud-platform gpuimage

Answers

Chosen as BEST ANSWER
- sandeepsign
- June 24, 2021 at 12:41 am
- 0 votes
0
Solution to my problem was:
- Run manually : sudo dpkg --configure -a
- Disconnect from machine.
- Connect again using SSH. Select Y again when asked to install nVidia Driver.
It works then.

(Edit)

- razimbres
- June 24, 2021 at 12:41 am
- 0 votes
0
This is the correct way to install NVIDIA driver on a GCP instance:
```
cd /

sudo apt purge nvidia-*
```
Reboot
```
cd /

sudo wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
sudo sh cuda_11.2.2_460.32.03_linux.run
```
Adjust your config accordingly as it pops options in the terminal

Reboot
Login or Signup to reply.

- Ali
- March 16, 2022 at 5:13 pm
- 0 votes
0
Make sure you are running as root. I know this sounds silly, but if you use their notebook instances the default user is not root and if you try to ssh into the instance and run something like gpustat etc or run custom code, you might get errors like NVIDIA drivers are not loaded or such.

If you make sure your user (which is called jupyter in the default case) is in the sudoers then all will work fine.

It is often very complicated to install or reinstall GPU drivers on GCP instances. Make sure you actually need to reinstall before you attempt other solutions.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

GCP VM not installing nVidia driver properly – Debian

Answers