Trying to run slurmd:
sudo systemctl start slurmd
I display the status of the daemon and an error is displayed on the screen:
>>sudo systemctl status slurmd
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2020-06-29 18:13:06 MSK; 2s ago
Docs: man:slurmd(8)
Process: 13402 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)
июн 29 18:13:06 ecm systemd[1]: Starting Slurm node daemon...
июн 29 18:13:06 ecm slurmd-ecm[13402]: Message aggregation disabled
июн 29 18:13:06 ecm slurmd-ecm[13402]: error: cgroup namespace 'freezer' not mounted. aborting
июн 29 18:13:06 ecm slurmd-ecm[13402]: error: unable to create freezer cgroup namespace
июн 29 18:13:06 ecm slurmd-ecm[13402]: error: Couldn't load specified plugin name for proctrack/cgroup: Plugin init() callback failed
июн 29 18:13:06 ecm slurmd-ecm[13402]: error: cannot create proctrack context for proctrack/cgroup
июн 29 18:13:06 ecm systemd[1]: slurmd.service: Control process exited, code=exited, status=1/FAILURE
июн 29 18:13:06 ecm slurmd-ecm[13402]: error: slurmd initialization failed
июн 29 18:13:06 ecm systemd[1]: slurmd.service: Failed with result 'exit-code'.
июн 29 18:13:06 ecm systemd[1]: Failed to start Slurm node daemon.
I don’t know how to fix it. I hope for your help. I use slurm version 18.08.05 and debian 10.
UPD.
I changed the ProctrackType value in slurm.config to proctrack/linuxproc:
ProctrackType=proctrack/linuxproc
All is work.
3
Answers
Unlike the documentation (man cgroup.conf), the default value of the parameter CgroupMountpoint is not good.
And you can reset the value of ProctrackType.
Tested on Debian10.7 slurmd version: slurm-wlm 18.08.5-2
In my case, this happened because I didn’t create and configure my cgroup.conf on the nodes running slurmd. Once this was added to the same directory as slurm.conf, it worked fine. CgroupMountpoint did not need to be defined as the default was sufficient.
Same error in my cluster, my
cgroup.conf
wasn’t configured.A simple
/etc/slurm/cgroup.conf
with:then: