Performance Measurement

The next question is how to obtain the sample run time needed to predict longer run times.

Timing Programs

In Unix, the time can be used to measure the actual time, CPU time, and system time of any program. To use it, we simply place the command "time" before the command to be benchmarked:

            408: f90 -O selsort.f90 -o selsort
            409: time selsort 50000nums > output
            4.985u 0.044s 0:05.03 99.8%     10+1147k 0+12io 0pf+0w
            

This shows tha the program used 4.98 seconds of user time (the program itself running), 0.044 seconds of system time (time spent by the operating system providing service to the program, such as reading the input file), and took 5.03 seconds of "wall clock time", or "real time" to complete from the moment the command was entered.

If the program is the only intensive process running, then real time should be about equal to user time + system time. If there are other programs running that compete for the CPU, memory, or disk, the real time could be significantly longer than the user + system time.

Profiling: Timing Subprograms

The time command is an easy way to determine run time for an entire process, but what if we want to determine run time for an individual subprogram? Large programs may implement many algorithms, and the time command does not allow us to see how much time each of them is using.

Profiling is a feature of most Unix compilers that allows us to see how much time each subprogram used. To use profiling, we must first compile the program with the -p flag (or -pg for GNU compilers. This causes the compiler to insert code that checks the time before and after every subprogram call. The results of these measurements are saved in a file called mon.out. We then use the prof command to see the results.

In the profile below, we can see that the sort_list() subroutine used most of the CPU time in our selection sort program.

    weise bacon ~ 413: f90 -O -p selsort.f90
    weise bacon ~ 414: ./selsort 50000nums > output
    weise bacon ~ 415: prof ./selsort
     %Time Seconds Cumsecs  #Calls   msec/call  Name
      94.9    2.60    2.60       1   2600.      sort_list_
       2.2    0.06    2.66                      __f90_slw_ia32
       1.1    0.03    2.69   50000      0.0006  __f_cvt_real
       0.7    0.02    2.71                      __mt_prof_release_lock
       0.4    0.01    2.72                      __f90_open_for_output_r
       0.4    0.01    2.73   50001      0.0002  __f90_sslr
       0.4    0.01    2.74   50001      0.0002  __f90_eslw
       0.0    0.00    2.74       1      0.      main
       0.0    0.00    2.74       1      0.      read_list_
       0.0    0.00    2.74       1      0.      MAIN_
       0.0    0.00    2.74       1      0.      print_list_
       0.0    0.00    2.74       3      0.      __getarg_
       0.0    0.00    2.74       1      0.      __f90_deallocate
       0.0    0.00    2.74       1      0.      __f90_allocate2
       0.0    0.00    2.74       1      0.      f90_init
       0.0    0.00    2.74   50001      0.0000  __f90_sslw
       0.0    0.00    2.74       1      0.      __f90_slw_ch
       0.0    0.00    2.74   50000      0.0000  __f90_slw_r8
       0.0    0.00    2.74       1      0.      __f90_init
       0.0    0.00    2.74       1      0.      __f90_slr_i4
       0.0    0.00    2.74   50000      0.0000  __f90_slr_r8
       0.0    0.00    2.74   50001      0.0000  __f90_eslr
       0.0    0.00    2.74   50001      0.0000  __f90_write_a
       0.0    0.00    2.74   50001      0.0000  __f90_get_default_output_unit
       0.0    0.00    2.74   50000      0.0000  __f90_get_numbered_unit_a
       0.0    0.00    2.74       1      0.      __f90_get_numbered_unit_r
       0.0    0.00    2.74  100004      0.0000  __f90_release_unit
       0.0    0.00    2.74       1      0.      __f90_open
       0.0    0.00    2.74       1      0.      __f90_close_unit_a
       0.0    0.00    2.74       1      0.      __f90_close_a
       0.0    0.00    2.74       1      0.      __f90_close
       0.0    0.00    2.74       1      0.      __f90_initio_a
       0.0    0.00    2.74       1      0.      __f90_opencat