skip to Main Content

I have followed this GCP guide with Ubuntu 18 and 20 (have also tried Ubuntu Lite, Debian and Centos 7) but, unfortunately, after completing the lengthy install I get this:

me@gpu:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

I have tried installing via the script and via the direct downloads from the Nvidia site for Cuda 10. Ready to pull my hair out if that helps! I don’t understand how a company that builds a bazillion GPU’s can’t make the installation process robust?

I have also tried these recommendations with no luck.

2

Answers


  1. Chosen as BEST ANSWER

    I was able to get it working. The mistake I was making was not doing the pre-installation steps before running the cuda_10.1.243_418.87.00_linux.run script. I was under the impression the *.run file would do everything for me. It would help if users were told they MUST do the pre-installation steps. Specifically I had to do this for Ubuntu 18:

    sudo nano /etc/modprobe.d/blacklist-nouveau.conf
    blacklist nouveau
    options nouveau modeset=0
    sudo update-initramfs -u
    reboot
    

    This seems like a bit of a “hack”, so not sure why nvidia can’t make the installation process more robust? They make a bazillion of these cards. It’s not like some homemade product with a niche user base…


  2. If you’ve installed the driver so many times and nvidia-smi is still failing to communicate, take a look into prime-select.

    1. Run prime-select query, this way you are going to get all possible options, it should show at least nvidia | intel.

    2. Select prime-select nvidia.

    3. Then, if you see nvidia is already selected, choose a different one, e.g. prime-select intel. Next, switch back to nvidia prime-select nvidia

    4. Reboot and check nvidia-smi.

    Plus, it could be a good idea to run again:

    sudo apt install nvidia-cuda-toolkit
    

    When it finishes, reboot the machine, and nvidia-smi should work then.

    Now, in other cases it works to follow these instructions to install CuDNn and Cuda on VMs cuda_11.2_installation_on_Ubuntu_20.04.

    And finally, in some other cases it is caused by unattended-upgrades. Take a look into the settings and adjust them if it is causing unexpected results. This URL has the documentation for Debian, and I was able to see that you already tested with that distro UnattendedUpgrades.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search