The next question is how to obtain the sample run time needed to predict longer run times.
In Unix, the time can be used to measure the actual time, CPU time, and system time of any program. To use it, we simply place the command "time" before the command to be benchmarked:
408: f90 -O selsort.f90 -o selsort 409: time selsort 50000nums > output 4.985u 0.044s 0:05.03 99.8% 10+1147k 0+12io 0pf+0w
This shows tha the program used 4.98 seconds of user time (the program itself running), 0.044 seconds of system time (time spent by the operating system providing service to the program, such as reading the input file), and took 5.03 seconds of "wall clock time", or "real time" to complete from the moment the command was entered.
If the program is the only intensive process running, then real time should be about equal to user time + system time. If there are other programs running that compete for the CPU, memory, or disk, the real time could be significantly longer than the user + system time.
The time command is an easy way to determine run time for an entire process, but what if we want to determine run time for an individual subprogram? Large programs may implement many algorithms, and the time command does not allow us to see how much time each of them is using.
Profiling is a feature of most Unix compilers that allows
us to see how much time each subprogram used. To use profiling,
we must first compile the program with the -p
flag (or -pg
for GNU compilers. This causes
the compiler to insert code that checks the time before and
after every subprogram call. The results of these measurements
are saved in a file called mon.out
.
We then use the prof command to see the
results.
In the profile below, we can see that the sort_list() subroutine used most of the CPU time in our selection sort program.
weise bacon ~ 413: f90 -O -p selsort.f90 weise bacon ~ 414: ./selsort 50000nums > output weise bacon ~ 415: prof ./selsort %Time Seconds Cumsecs #Calls msec/call Name 94.9 2.60 2.60 1 2600. sort_list_ 2.2 0.06 2.66 __f90_slw_ia32 1.1 0.03 2.69 50000 0.0006 __f_cvt_real 0.7 0.02 2.71 __mt_prof_release_lock 0.4 0.01 2.72 __f90_open_for_output_r 0.4 0.01 2.73 50001 0.0002 __f90_sslr 0.4 0.01 2.74 50001 0.0002 __f90_eslw 0.0 0.00 2.74 1 0. main 0.0 0.00 2.74 1 0. read_list_ 0.0 0.00 2.74 1 0. MAIN_ 0.0 0.00 2.74 1 0. print_list_ 0.0 0.00 2.74 3 0. __getarg_ 0.0 0.00 2.74 1 0. __f90_deallocate 0.0 0.00 2.74 1 0. __f90_allocate2 0.0 0.00 2.74 1 0. f90_init 0.0 0.00 2.74 50001 0.0000 __f90_sslw 0.0 0.00 2.74 1 0. __f90_slw_ch 0.0 0.00 2.74 50000 0.0000 __f90_slw_r8 0.0 0.00 2.74 1 0. __f90_init 0.0 0.00 2.74 1 0. __f90_slr_i4 0.0 0.00 2.74 50000 0.0000 __f90_slr_r8 0.0 0.00 2.74 50001 0.0000 __f90_eslr 0.0 0.00 2.74 50001 0.0000 __f90_write_a 0.0 0.00 2.74 50001 0.0000 __f90_get_default_output_unit 0.0 0.00 2.74 50000 0.0000 __f90_get_numbered_unit_a 0.0 0.00 2.74 1 0. __f90_get_numbered_unit_r 0.0 0.00 2.74 100004 0.0000 __f90_release_unit 0.0 0.00 2.74 1 0. __f90_open 0.0 0.00 2.74 1 0. __f90_close_unit_a 0.0 0.00 2.74 1 0. __f90_close_a 0.0 0.00 2.74 1 0. __f90_close 0.0 0.00 2.74 1 0. __f90_initio_a 0.0 0.00 2.74 1 0. __f90_opencat