I have followed this GCP guide with Ubuntu 18 and 20 (have also tried Ubuntu Lite, Debian and Centos 7) but, unfortunately, after completing the lengthy install I get this:
me@gpu:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
I have tried installing via the script and via the direct downloads from the Nvidia site for Cuda 10. Ready to pull my hair out if that helps! I don’t understand how a company that builds a bazillion GPU’s can’t make the installation process robust?
I have also tried these recommendations with no luck.
2
Answers
I was able to get it working. The mistake I was making was not doing the pre-installation steps before running the cuda_10.1.243_418.87.00_linux.run script. I was under the impression the *.run file would do everything for me. It would help if users were told they MUST do the pre-installation steps. Specifically I had to do this for Ubuntu 18:
This seems like a bit of a “hack”, so not sure why nvidia can’t make the installation process more robust? They make a bazillion of these cards. It’s not like some homemade product with a niche user base…
If you’ve installed the driver so many times and
nvidia-smi
is still failing to communicate, take a look intoprime-select
.Run
prime-select query
, this way you are going to get all possible options, it should show at leastnvidia | intel
.Select
prime-select nvidia
.Then, if you see
nvidia is already selected
, choose a different one, e.g.prime-select intel
. Next, switch back to nvidiaprime-select nvidia
Reboot and check
nvidia-smi
.Plus, it could be a good idea to run again:
When it finishes, reboot the machine, and nvidia-smi should work then.
Now, in other cases it works to follow these instructions to install CuDNn and Cuda on VMs cuda_11.2_installation_on_Ubuntu_20.04.
And finally, in some other cases it is caused by unattended-upgrades. Take a look into the settings and adjust them if it is causing unexpected results. This URL has the documentation for Debian, and I was able to see that you already tested with that distro UnattendedUpgrades.