MPI programming is complex and not necessary for all cluster users. Many cluster users will only run HTC jobs, which do not require communication between processes. The goal of this example is to provide a very simple introduction to MPI programming for those who are interested. We assume some familiarity with C or Fortran.
Users interested in pursuing MPI programming are encouraged to consult a book on the subject. Many good resources are cited on the HPC website at http://www4.uwm.edu/hpc/related_resources/.
C version:
/* * Program description: * * MPI Example * * A typical MPI job will start N processes, all running the same * program. This is known as the Single-Program Multiple-Data (SPMD) * model. * * This program demonstrates how the various processes * distinguish themselves from the rest using MPI library functions. * * Return values of all MPI functions are checked for good practice, * although some MPI functions will by default terminate the process without * returning if an error is encountered. This behavior can be changed, so * including error checks for every MPI call is a good idea. */ /* * Modification history: * Date Name Modification * 2011-08-24 Jason Bacon Begin */ #include <stdio.h> /* fputs(), printf(), etc. */ #include <stdlib.h> /* exit() */ #include <sysexits.h> /* EX_ exit constants */ #include <sys/param.h> /* MAXHOSTNAMELEN */ #include <unistd.h> /* gethostname() */ #include <mpi.h> /* MPI functions and constants */ #define ROOT_RANK 0 #define MESSAGE_MAX_LEN 128 #define TAG 0 int main(int argc, char *argv[]) { int total_processes; int my_rank; int rank; MPI_Status status; char message[MESSAGE_MAX_LEN + 1]; char hostname[MAXHOSTNAMELEN + 1]; /* Get name of node running this process */ if ( gethostname(hostname, MAXHOSTNAMELEN) != 0 ) { fputs("gethostname() failed.\n", stderr); exit(EX_OSERR); } /* Initialize data for the MPI functions */ if ( MPI_Init(&argc, &argv) != MPI_SUCCESS ) { fputs("MPI_Init failed.\n", stderr); exit(EX_UNAVAILABLE); } /* Find out how many processes are in this MPI job */ if ( MPI_Comm_size(MPI_COMM_WORLD, &total_processes) != MPI_SUCCESS ) { fputs("MPI_Comm_size failed.\n", stderr); exit(EX_UNAVAILABLE); } /* * Each process withing the job has a unique integer "rank". * This is how each process determines its role within the job. */ if ( MPI_Comm_rank(MPI_COMM_WORLD, &my_rank) != MPI_SUCCESS ) { fputs("MPI_Comm_rank failed.\n", stderr); exit(EX_UNAVAILABLE); } /* * For this job, the process with rank 0 will assume the role of * the "root" process, which will run different code than the * other processes. */ if (my_rank == ROOT_RANK) { printf("We have %d processors\n", total_processes); /* Send a message to all non-root processes */ for (rank = 1; rank < total_processes; ++rank) { snprintf(message, MESSAGE_MAX_LEN, "Process %d, where are you? ", rank); if ( MPI_Send(message, MESSAGE_MAX_LEN, MPI_CHAR, rank, TAG, MPI_COMM_WORLD) != MPI_SUCCESS ) { fputs("MPI_Comm_rank failed.\n", stderr); exit(EX_UNAVAILABLE); } } /* Read the response from all non-root processes */ for (rank = 1; rank < total_processes; ++rank) { if ( MPI_Recv(message, MESSAGE_MAX_LEN, MPI_CHAR, rank, TAG, MPI_COMM_WORLD, &status) != MPI_SUCCESS ) { fputs("MPI_Comm_rank failed.\n", stderr); exit(EX_UNAVAILABLE); } printf("%s\n", message); } } else { /* Wait for message from root process */ if ( MPI_Recv(message, MESSAGE_MAX_LEN, MPI_CHAR, ROOT_RANK, TAG, MPI_COMM_WORLD, &status) != MPI_SUCCESS ) { fputs("MPI_Comm_rank failed.\n", stderr); exit(EX_UNAVAILABLE); } printf("Process %d received message: %s\n", my_rank, message); /* Send response */ snprintf(message, MESSAGE_MAX_LEN, "Process %d is on %s", my_rank, hostname); if ( MPI_Send(message, MESSAGE_MAX_LEN, MPI_CHAR, ROOT_RANK, TAG, MPI_COMM_WORLD) != MPI_SUCCESS ) { fputs("MPI_Comm_rank failed.\n", stderr); exit(EX_UNAVAILABLE); } } /* * All MPI processes must execute MPI finalize to synchronize * the job before they exit. */ if ( MPI_Finalize() != MPI_SUCCESS ) { fputs("MPI_Finalize failed.\n", stderr); exit(EX_UNAVAILABLE); } return EX_OK; }
Fortran version:
!----------------------------------------------------------------------- ! Program description: ! ! MPI Example ! ! A typical MPI job will start N processes, all running the same ! program. This is known as the Single-Program Multiple-Data (SPMD) ! model. ! ! This program demonstrates how the various processes ! distinguish themselves from the rest using MPI library functions. !----------------------------------------------------------------------- !----------------------------------------------------------------------- ! Modification history: ! Date Name Modification ! 2011-08-24 Jason Bacon Begin !----------------------------------------------------------------------- module constants ! Global Constants double precision, parameter :: & PI = 3.1415926535897932d0, & E = 2.7182818284590452d0, & TOLERANCE = 0.00000000001d0, & ! For numerical methods AVOGADRO = 6.0221415d23 ! Not known to more digits than this integer, parameter :: & MESSAGE_MAX_LEN = 128, & HOSTNAME_MAX_LEN = 128, & TAG = 0, & ROOT_RANK = 0 end module constants ! Main program body program mpi_hello use constants ! Constants defined above use ISO_FORTRAN_ENV ! INPUT_UNIT, OUTPUT_UNIT, ERROR_UNIT, etc. ! Disable implicit declarations (i-n rule) implicit none include 'mpif.h' ! MPI constants ! Variable defintions character(MESSAGE_MAX_LEN) :: message ! MPI message buffer character(HOSTNAME_MAX_LEN) :: hostname ! Name of node integer :: total_processes, my_rank, rank, count, ierr, & message_length = MESSAGE_MAX_LEN integer :: status(MPI_STATUS_SIZE) ! Get name of node running this process call hostnm(hostname) ! Initialize data for the MPI functions call mpi_init(ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_init failed.' ! Find out how many processes are in this MPI job call mpi_comm_size(MPI_COMM_WORLD, total_processes, ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_comm_size failed.' ! Each process withing the job has a unique integer "rank". ! This is how each process determines its role within the job. call mpi_comm_rank(MPI_COMM_WORLD, my_rank, ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_comm_rank failed.' ! For this job, the process with rank ROOT_RANK will assume the role of ! the "root" process, which will run different code than the ! other processes. if ( my_rank == ROOT_RANK ) then ! Only root process runs this clause ! Do this in root so it only prints once print '(a,i0,a)', 'We have ', total_processes, ' processes.' ! Debug code ! print '(a,a,a)', 'Root processing running on ', trim(hostname), '.' do rank = 1, total_processes-1 write (message, '(a,i0,a)') 'Process ', rank, ', where are you?' ! Debug code ! print '(a,a,a,i0,a)', 'Sending ', trim(message), ' to process ', & ! rank, '.' ! It's stupid to send a padded string, but it's complicated ! for mpi_recv() to receive a message of unknown length ! A smarter program would save network bandwidth and time by using ! len_trim(message) instead of MESSAGE_MAX_LEN. call mpi_send(message, MESSAGE_MAX_LEN, MPI_CHARACTER, rank, & TAG, MPI_COMM_WORLD, status, ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_send failed.' enddo do count = 1, total_processes-1 ! Accept message from slave processes in any rank order ! by using MPI_ANY_SOURCE for rank in recv call call mpi_recv(message, MESSAGE_MAX_LEN, MPI_CHARACTER, & MPI_ANY_SOURCE, MPI_ANY_TAG, & MPI_COMM_WORLD, status, ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_recv failed.' print *, 'Root received response: ', trim(message) enddo else ! All slave processes run this section ! Debug code ! print '(a,i0,a,a, a)', 'Process ', my_rank, ' running on ', & ! trim(hostname), '.' ! Wait for message from root call mpi_recv(message, MESSAGE_MAX_LEN, MPI_CHARACTER, ROOT_RANK, & TAG, MPI_COMM_WORLD, status, ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_recv failed.' print '(a,i0,a,a)', 'Process ', my_rank, ' received: ', trim(message) ! Respond to message from root write (message, '(a,i0,a,a,a)') 'Process ', my_rank,' is on ', & trim(hostname), '.' call mpi_send(message, len(message), MPI_CHARACTER, ROOT_RANK, & TAG, MPI_COMM_WORLD, status, ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_send failed.' endif ! All MPI processes must execute MPI finalize to synchronize ! the job before they exit. call mpi_finalize(ierr) if ( ierr /= MPI_SUCCESS ) stop 'mpi_finalize failed.' end program
In a scheduled environment, MPI jobs are submitted like batch serial jobs. The scheduler is informed about resource requirements (cores, memory) but does not dispatch all the processes. The scheduler dispatches a single mpirun command and the mpirun command then creates all the processes to a list of nodes provided by the scheduler.
SLURM submit script:
#!/bin/sh -e # Number of cores #SBATCH --ntasks=8 mpirun ./mpi-hello
Programs should be compiled in the same environment in which they will run, i.e. on a compute node, under the scheduler. This will ensure that they find the same tools and libraries at run time as they did at compile time. The best way to achieve this is by using a submit script to compile:
SLURM build script:
#!/bin/sh -e # Number of cores #SBATCH --ntasks=1 mpicc -o mpi-hello mpi-hello.c
PBS submit script:
#!/bin/sh # Job name #PBS -N MPI-Hello # Number of cores #PBS -l procs=8 ########################################################################## # Shell commands ########################################################################## # Torque starts from the home directory on each node, so we must manually # cd to the working directory where the hello binary is located. cd $PBS_O_WORKDIR mpirun ./mpi-hello
LSF submit script:
#!/usr/bin/env bash # Job name #BSUB -J MPI-Hello # Number of cores #BSUB -n 8 ########################################################################## # Shell commands ########################################################################## # LSF requires the use of wrapper scripts rather than using mpirun directly openmpi_wrapper ./mpi-hello
calcpi.c
from
Chapter 34, Programming for HTC. For simplicity, use a constant for
the number of random points generated and use the process
rank for the srandom() seed.