I am facing issue when starting slurmd service on my compute nodes.
× slurmd.service – Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2022-10-12 04:10:25 EDT; 7s ago
Process: 5839 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 5839 (code=exited, status=1/FAILURE)
CPU: 3ms
Oct 12 04:10:25 compute1.ghpcv3.au.dk systemd[1]: Started Slurm node daemon.
Oct 12 04:10:25 compute1.ghpcv3.au.dk systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE
Oct 12 04:10:25 compute1.ghpcv3.au.dk systemd[1]: slurmd.service: Failed with result ‘exit-code’.
#slurmd -D -vv
slurmd: debug: Log file re-opened
slurmd: debug: CPUs:1 Boards:1 Sockets:1 CoresPerSocket:1 ThreadsPerCore:1
slurmd: error: Couldn’t find the specified plugin name for cgroup/v2 looking at all files
slurmd: error: cannot find cgroup plugin for cgroup/v2
slurmd: error: cannot create cgroup context for cgroup/v2
slurmd: error: Unable to initialize cgroup plugin
slurmd: error: slurmd initialization failed
What I missing?
2
Answers
You may have to manually create
cgroup.conf
in your slurm config directory https://stackoverflow.com/a/65226055/5749775I fixed this by creating a fairly simple conf:
I had the same problem. Slurm has support for both cgroup/v1 and v2, but support for v2 is only compiled in if the dbus development files are present. So first install
dbus-devel
and then run a clean Slurm build.