The guides I have for deploying LXC on CentOS is to install snapd’s lxd
https://www.cyberciti.biz/faq/set-up-use-lxd-on-centos-rhel-8-x/
SnapD is a type of service that allows installing debian/ubuntu based packages with the logic being lxd is most up to date on that platform.
Well. I’m all open to installing an alternative version if it’s easier to enable gpu passthrough.
Ultimately I’m trying to build a container environment where I can run the latest version of python and jupyter that has gpu support.
I have some guides on how to enable gpu passthrough.
https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
LXC GPU Passthrough
by u/cobbian in Proxmox
I’ve added the following kernel modules on my ol8 host
/etc/modules-load.d/vfio-pci.conf
# Nvidia modules
nvidia
nvidia_uvm
#noticed snapd has a modules file I can't edit
/var/lib/snapd/snap/core18/1988/etc/modules-load.d/modules.conf
Then modified grub
nano /etc/default/grub
#https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough
GRUB_CMDLINE_LINUX
#iommu=on amd_iommu=on
iommu=pt amd_iommu=pt
grub2-mkconfig -o /boot/grub2/grub.cfg
Then added udev rules
nano /etc/udev/rules.d/70-nvidia.rules
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"
#reboot
Then added gpu to lxc.conf
ls -l /dev/nvidia*
# Allow cgroup access
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 243:* rwm
nano /var/snap/lxd/common/lxd/logs/nvidia-test/lxc.conf
# Pass through device files
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none ind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
inside lxc container I started (ol8)
#installed nvidia-driver that comes with nvidia-smi
nvidia-driver-cuda-3:460.32.03-1.el8.x86_64
#installed cuda
cuda-11-2-11.2.2-1.x86_64
when I go to run nvidia-smi
[root@nvidia-test ~]# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
because I couldn’t edit the snapd module file thought to manually copy the nvidia kernel module files over and insmod them (determined using modprobe –show-depends )
[root@nvidia-test ~]# insmod nvidia.ko.xz NVreg_DynamicPowerManagement=0x02
insmod: ERROR: could not insert module nvidia.ko.xz: Function not implemented
some diagnostic information inside my container
[root@nvidia-test ~]# find /sys | grep dmar
find: '/sys/kernel/debug': Permission denied
find: '/sys/fs/pstore': Permission denied
find: '/sys/fs/fuse/connections/59': Permission denied
[root@nvidia-test ~]# lspci | grep -i nvidia
05:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P1000] (rev a1)
05:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
So… is there something else I should do? Should I remove snapd lxd and go with the default lxc provided by OL8?
2
Answers
Found the answer
#https://ubuntu.com/blog/nvidia-cuda-inside-a-lxd-container
You can use GPU Passthrough to a LXD container by creating a LXD
gpu
device. Thisgpu
device will collectively do all the necessary tasks to expose the GPU to the container, including the configuration you made above explicitly.Here is the documentation with all extra parameters (for example, if there are more than one GPU, how do you distinguish),
https://linuxcontainers.org/lxd/docs/master/instances#type-gpu
In the simplest form, you can run the following to an existing container to add the default GPU (to the container).
When you add a GPU in a NVidia container, you also need to add the corresponding NVidia runtime to the container (so that it matches the kernel version on the host!). In containers we do not need (and cannot) add kernel drivers but we need to add the runtime (libraries, utilities, and other software). LXD takes care of this and is downloading for you the appropriate version of the NVidia container runtime and attaches it to the container. Here is a full example that creates a container while enabling the NVidia runtime, and then adds the NVidia GPU device to that container.
If you are creating often such GPU containers, you can create a LXD profile with the GPU configuration. Then, if you want a GPU container, you can either launch the container with the
nvidia
profile, or you can apply thenvidia
profile to existing containers and thus make them GPU containers!We have been using the snap package of LXD for all the above instructions.