I cannot resolve the following issue:
root@MyCluster:/opt/WorkLoadManager/slurm/23.11.5# systemctl status slurmctld
× slurmctld.service - Slurm controller daemon
Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2024-04-09 16:18:33 CEST; 4s ago
Process: 2592430 ExecStart=/opt/WorkLoadManager/slurm/23.11.5/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 2592430 (code=exited, status=1/FAILURE)
CPU: 44ms
Apr 09 16:18:32 MyCluster.num.lab slurmctld[2592430]: slurmctld: Job accounting information stored, but details not gathered
Apr 09 16:18:32 MyCluster.num.lab slurmctld[2592430]: slurmctld: slurmctld version 23.11.5 started on cluster MyCluster
Apr 09 16:18:32 MyCluster.num.lab systemd[1]: Started Slurm controller daemon.
Apr 09 16:18:32 MyCluster.num.lab slurmctld[2592430]: slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
Apr 09 16:18:33 MyCluster.num.lab slurmctld[2592430]: slurmctld: priority/multifactor: _read_last_decay_ran: No last decay (/var/spool/slurmctld/priority_last_decay_ran) to recov>
Apr 09 16:18:33 MyCluster.num.lab slurmctld[2592430]: slurmctld: No memory enforcing mechanism configured.
Apr 09 16:18:33 MyCluster.num.lab slurmctld[2592430]: slurmctld: error: mysql_real_connect failed: 1045 Access denied for user 'root'@'localhost' (using password: NO)
Apr 09 16:18:33 MyCluster.num.lab slurmctld[2592430]: slurmctld: fatal: You haven't inited this storage yet.
Apr 09 16:18:33 MyCluster.num.lab systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Apr 09 16:18:33 MyCluster.num.lab systemd[1]: slurmctld.service: Failed with result 'exit-code'.
I am using mariadb Ver 15.1 Distrib 10.6.16-MariaDB, for debian-linux-gnu (x86_64) using EditLine wrapper and I previously tried the following fix:
MariaDB [(none)]> ALTER USER 'root'@'localhost' IDENTIFIED BY 'PASSWD';
MariaDB [(none)]> flush privileges;
MariaDB [(none)]> USE mysql;
MariaDB [mysql]> SELECT User, Host, plugin FROM mysql.user;
+-------------+-----------+-----------------------+
| User | Host | plugin |
+-------------+-----------+-----------------------+
| mariadb.sys | localhost | mysql_native_password |
| root | localhost | mysql_native_password |
| mysql | localhost | mysql_native_password |
| slurm | localhost | mysql_native_password |
| slurm | system0 | mysql_native_password |
+-------------+-----------+-----------------------+
quit
#systemctl restart mariadb
But then I got slurmctld: error: mysql_real_connect failed: 1698 Access denied for user 'root'@'localhost'
.
I am expecting slurmctld.service to be active as the other service slurmd and slurmdbd.
2
Answers
it was actually due to Field
JobCompType=jobcomp/mysql
from slurm.conf. After modifying this field toJobCompType=jobcomp/none
, the slurmctld service is activated.Thanks for your help!
Below is part of the slurm.conf if other people want to compare
The controller seems to be configured to use the
slurmdbd
service for accounting (accounting_storage/slurmdbd
) yet it tries to access the MySQL database directly. I therefore guess thejobcomp/mysql
plugin is active too. The error message seems to indicate thatJobCompPass
is not set.You could therefore either
jobcomp/mysql
plugin as it is redundant with theaccounting_storage/slurmdbd
plugin; orJobCompPass
to the password of theroot
MySQL user