**NOTE:** NHC doesn’t have to use slurmd to do it’s job. It can be run via cron, other framework, or even a different HPC scheduler (PBS, SGE…). The following are only needed if you want `slurmd` to invoke `nhc-wrapper`. The current configuration on Cheaha does use `slurmd`, thus the changes below.
> `HealthCheckProgram`: Fully qualified pathname of a script to execute as user root periodically on all compute nodes that are not in the `NOT_RESPONDING` state. This program may be used to verify the node is fully operational and DRAIN the node or send email if a problem is detected. Any action to be taken must be explicitly performed by the program (e.g. execute "`scontrol update NodeName=foo State=drain Reason=tmp_file_system_full`" to drain a node). The execution interval is controlled using the `HealthCheckInterval` parameter. Note that the `HealthCheckProgram` will be executed at the same time on all nodes to minimize its impact upon parallel programs. This program will be killed if it does not terminate normally within 60 seconds. This program will also be executed when the `slurmd` daemon is first started and before it registers with the `slurmctld` daemon. By default, no program will be executed.
> `HealthCheckProgram`: Fully qualified pathname of a script to execute as user root periodically on all compute nodes that are not in the `NOT_RESPONDING` state. This program may be used to verify the node is fully operational and DRAIN the node or send email if a problem is detected. Any action to be taken must be explicitly performed by the program (e.g. execute "`scontrol update NodeName=foo State=drain Reason=tmp_file_system_full`" to drain a node). The execution interval is controlled using the `HealthCheckInterval` parameter. Note that the `HealthCheckProgram` will be executed at the same time on all nodes to minimize its impact upon parallel programs. This program will be killed if it does not terminate normally within 60 seconds. This program will also be executed when the `slurmd` daemon is first started and before it registers with the `slurmctld` daemon. By default, no program will be executed.
> `HealthCheckInterval`: The interval in seconds between executions of `HealthCheckProgram`. The default value is zero, which disables execution.
> `HealthCheckInterval`: The interval in seconds between executions of `HealthCheckProgram`. The default value is zero, which disables execution.