The MPI system attempts to clean up failed jobs by terminating
all processes in the event that any one of the processes fails.
However, MPI's automatic cleanup can take time, and cannot
always detect failures. This sometimes leads to orphaned
processes remaining in the system unbeknownst to the scheduler,
which can cause problems for other jobs.
Hence, it is each programmer's responsibility to make sure
that their MPI programs do not leave orphaned processes
hanging around on the cluster. How this is accomplished
depends on the particular program, however every MPI
program should follow these general rules:
-
Check the exit status of every
MPI function/subroutine call and every other statement
in the program that could fail in a way that would prevent
the program from running successfully. Some other examples
include memory allocations and file opens, reads, writes.
These are only examples, however. It is the programmer's
responsibility to examine every line of code and check for
possible failures. There should be
no exceptions to this rule.
-
Whenever a statement fails, the program should detect the failure
and take appropriate action.
If you cannot determine a course of action that would allow
the program to continue, then simply perform
any necessary cleanup work and terminate the process immediately.
MPI programs that do not self-terminate may leave orphaned
processes running on a cluster that interfere with other users'
jobs.
-
How can an MPI program ensure that it doesn't leave orphaned
processes running?