The MPI system attempts to clean up failed jobs by terminating
        all processes in the event that any one of the processes fails.
        However, MPI's automatic cleanup can take time, and cannot
        always detect failures.  This sometimes leads to orphaned
        processes remaining in the system unbeknownst to the scheduler,
        which can cause problems for other jobs.
        
        Hence, it is each programmer's responsibility to make sure
        that their MPI programs do not leave orphaned processes
        hanging around on the cluster.  How this is accomplished
        depends on the particular program, however every MPI
        program should follow these general rules:
        
- 
            Check the exit status of every
            MPI function/subroutine call and every other statement
            in the program that could fail in a way that would prevent
            the program from running successfully.  Some other examples
            include memory allocations and file opens, reads, writes.
            These are only examples, however.  It is the programmer's
            responsibility to examine every line of code and check for
            possible failures.  There should be
            no exceptions to this rule.
            
- 
            Whenever a statement fails, the program should detect the failure
            and take appropriate action.
            If you cannot determine a course of action that would allow
            the program to continue, then simply perform
            any necessary cleanup work and terminate the process immediately.
            MPI programs that do not self-terminate may leave orphaned
            processes running on a cluster that interfere with other users'
            jobs.
            
- 
                How can an MPI program ensure that it doesn't leave orphaned
                processes running?