Launch Plugin API
Overview
This document describes the launch plugin that is responsible for launching a parallel task in Slurm and the API that defines them. It is intended as a resource to programmers wishing to write their own launch plugin.
const char plugin_name[]="launch Slurm plugin"
const char
plugin_type[]="launch/[aprun|poe|runjob|slurm"
- aprunUse Cray's aprun command to launch tasks - used on Cray systems with ALPS installed.
- poeUse IBM's poe command to launch tasks - used on systems IBM's parallel environment (PE) installed.
- runjobUse IBM's runjob command to launch tasks - used on BlueGene/Q systems.
- slurmUse Slurm's default launching infrastructure
const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and
any attempt to load the plugin from a different version of Slurm will result
in an error.
If not specified, then the plugin may be loadeed by Slurm commands and
daemons from any version, however this may result in difficult to diagnose
failures due to changes in the arguments to plugin functions or changes
in other Slurm functions used by the plugin.
The programmer is urged to study src/plugins/launch/slurm/launch_slurm.c for a sample implementation of a Slurm launch plugin.
API Functions
int init (void)
Description:
Called when the plugin is loaded, before any other functions are
called. Put global initialization here.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void fini (void)
Description:
Called when the plugin is removed. Clear any allocated storage here.
Returns: None.
Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().
int launch_p_setup_srun_opt(char **rest)
Description:
Sets up the srun operation.
Arguments:
rest: extra parameters on the
command line not processed by srun
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_handle_multi_prog_verify(int command_pos)
Description:
Is called to verify a multi-prog file if verifying needs to be done.
Arguments:
command_pos: to be used with
global opt variable to tell which spot the command is in opt.argv.
Returns:
1 if handled, or
0 if not.
int launch_p_create_job_step(srun_job_t *job, bool use_all_cpus, void (*signal_function)(int), sig_atomic_t *destroy_job)
Description:
Creates the job step.
Arguments:
job: the job to run.
use_all_cpus: choice whether to use
all cpus.
signal_function: function that
handles the signals coming in.
destroy_job: pointer to a global
flag signifying if the job was canceled while allocating.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
launch_p_step_launch(srun_job_t *job, slurm_step_io_fds_t *cio_fds, uint32_t *global_rc)
Description:
Launches the job step.
Arguments:
job: the job to launch.
cio_fds: filled in io descriptors
global_rc: srun global return code.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_step_wait(srun_job_t *job, bool got_alloc)
Description:
Waits for the job to be finished.
Arguments:
job: the job to wait for.
got_alloc: if the resource
allocation was created inside srun.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_step_terminate(void)
Description:
Terminates the job step.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void launch_p_print_status(void)
Description:
Gets the status of the job.
void launch_p_fwd_signal(int signal)
Description:
Sends a forward signal to any underlying tasks.
Arguments:
signal: the signal that needs to be sent.
Last modified 11 February 2016