As compiled languages, C, C++, and Fortran programs are translated entirely to machine language before being executed.
In all three languages, production of an executable file involves up to three steps, outlined below and in Figure 16.1, “Compilation”.
Preprocessing: This step runs the source code through a stream editor called the preprocessor, which is designed specifically for editing source code. The preprocessor makes modifications such as inserting the contents of header files and replacing named constants with their values, and outputs modified source code.
The preprocessor command is usually cpp ( short for C PreProcessor ).
The preprocessor is described in detail in the section called “The C Preprocessor”.
Linking: This step combines the object files from the
compilation step with other object files stored in
libraries (precompiled collections
of functions) and the machine code needed to start a
program. The result is an executable
file such as /bin/ls
or any other Unix
command.
The linker program is usually called ld.
An example of a
library is /usr/lib/libc.so
, the standard
C library. It contains the object files for many standard
functions used in the C language, such as
printf(), scanf(), qsort(), strcpy()
, etc.
You generally do not need to run these steps individually. They are executed automatically in sequence when you run a compiler such as cc, clang, gcc, or gfortran.
Every Unix system with a C compiler has a cc
command. On FreeBSD and OS X, cc is
equivalent to clang. On Linux systems,
cc is equivalent to gcc.
On some commercial Unix systems, cc is
a proprietary compiler developed by the vendor.
Clang/LLVM and GCC are both open source compiler suites and
are highly compatible with each other. They support most of
the same command-line flags, such as -Wall
to enable all possible warning messages.
Unfortunately, there is no standard compiler name for Fortran, since most Unix system don't include a Fortran compiler. Fortran is usually added in the form of f2c (a Fortran 77 to C translator), gfortran (the open source GNU Fortran compiler), or flang (the open source Clang/LLVM Fortran compiler). There are also several commercial Fortran compilers available.
C source files have an extension of ".c". C++ files usually use ".cc", ".cpp", ".cxx", or ".c++".
Fortran files use ".f", ".F", ".for", or ".FOR" for Fortran 77, ".f90" or ".F90" for Fortran 90, ".f03" or ".F03" for Fortran 2003, and ".f08" or ".F08" for Fortran 2008.
Examples of building an executable file from a single source file:
shell-prompt: cc jumping-genes.c shell-prompt: gfortran gauss.f90
The commands above will produce an
executable file called
a.out
. This is the default for most Unix
compilers. The executable file is also sometimes called
a binary file. This is why program
directories on Unix systems are named "bin" (/bin, /usr/bin,
/usr/local/bin, etc.)
To run the binary program, we simply type it's file name followed by any arguments that it requires.
shell-prompt: ./a.out
Most Unix compilers also support the
-o
flag to specify a different output file name.
shell-prompt: cc jumping-genes.c -o jumping-genes shell-prompt: ./jumping-genes shell-prompt: gfortran gauss.f90 -o gauss shell-prompt: ./gauss
All Unix compilers support certain common flags. The
-g
tells the compiler to include debugging
information in the binary file, so that a
debugger (a program that helps you
find problems) can determine the location of a problem in
the source code while examining an executable.
The -O, -O2
and -O3
flags
tell the compiler
to turn on standard levels of object code optimization.
shell-prompt: cc -O2 jumping-genes.c -o jumping-genes shell-prompt: gfortran -O3 gauss.f90 -o gauss
You can keep your life simple by compiling with
cc, rather than specifically using
clang or gcc, and using
portable flags such as -O
.
The -Wall
flag is not entirely portable,
but is supported by both clang and
gcc, which are by far the most popular
compilers. Compiling with -Wall
is extremely
helpful for catching potential program bugs.
Using -O
will usually improve the speed of
your executable file significantly and will often reduce its
size as well.
The -O2
will usually offer only a marginal
improvement over -O
(and is actually the same
with some compilers), and -O3
will usually provide little or no benefit over
-O2
.
Higher optimization levels like -O3
may also
impede debugging, since they may reorganize the machine code in
ways that make it impossible to determine which line of
source code a given machine instruction came from.
Generally, the higher the level of optimization, the more
dangerous and less beneficial the optimizations will be.
Using -O
will include all optimizations
considered to be very safe and will provide the vast majority
of all the performance benefit that's possible.
You can also enable specific optimizations using other command
line flags, but such flags may not work with all compilers
and may produce executables that will not run on older
CPUs of the same family.
The -O
flags all aim to generate portable
executables that will run on any machine in the same family
of processors that is likely to still be in use. For example,
compiling with -O2
on the latest AMD or Intel
processor will generate an executable that should work on any
recent Intel or AMD processor produced in the last several years.
In rare cases, you may see
noticeably better performance by utilizing the latest
processor features. Clang and GCC make this relatively easy
with the -march=native
flag:
shell-prompt: clang -O2 -march=native super-analyzer.c
For most programs, this will make very little difference in speed. In extremely rare cases, it may reduce run time by as much as 30%. The executable produced will not work on older processors, however. You also need to be using a compiler that is new enough to support all or your bleeding-edge processor's features.
Before committing to anything more sophisticated than
-O2
, compare the run time of your program when
compiled with various options to see if it's really worth doing.
This can be easily done using the time
command, as discussed in the section called “Time”.
Libraries, as mentioned above, are collections of precompiled subprograms that we can use in our programs. Libraries are built with the same compilers as our programs (cc, c++, gfortran, flang, etc). We can create our own libraries as described in Chapter 21, Subprograms. More often, we will use libraries supplied with the compiler or installed via a package manager.
While all languages use libraries, the C language was intentionally designed to rely heavily on them. The C language designers decided not to give the language any features that could be implemented as a library function. This keeps the language very simple, fast, easy to learn, and easy to implement on new hardware. In some cases it makes the program a bit less elegant, but no harder to read in reality.
For example, to compare two strings in many languages, we might write something like the following:
if ( string1 == string2 )
The C language does not directly support string comparison, so for this we use a library function call:
if ( strcmp(string1, string2) == 0 )
Some libraries, such as the standard C library (usually
/usr/lib/libc.so
) are automatically searched
by the linker.
For other libraries, such as the standard math library
(usually /usr/lib/libm.so
)
we need to tell the linker to search it, by using the
-l
flag. This flag is immediately followed
by the unique portion of the library's file name. For example,
to use libm.so
, we specify
-lm
. To use liblzma.so
,
we would specify -llzma
.
cc -O gauss.c -o gauss -lm -llzma
All library file names begin with "lib" and end with common
extensions like ".a", ".so", or ".dylib". We omit these parts
when using the -l
flag.
Add-on libraries, such as those installed by a package manager,
may not be
in the linker's default search path, so we also need to use
-L
to tell the linker where to find the library
file. This
flag is immediately followed by the absolute or relative path
of the directory containing the library. For example, to use
/usr/local/lib/libblas.a
, we would use a
compile command like the following:
cc -O gauss.c -o gauss -L/usr/local/lib -lblas -lm gfortran -O gauss.f90 -o gauss -L/usr/local/lib -lblas -lm
-l
flags. For example,
if a function in the blas library calls a function in the standard
math library, then -lm
should come
after -lblas
.
The compilation process for C++ or Fortran is largely the same
as for C. C++ also uses a preprocessor stage. Preprocessing
was not part of the original Fortran language, but it has
been adopted from C. Fortran compilers can be told to use
the C preprocessor by specifying command-line options
such as -cpp
or by choosing an appropriate
filename extension such as .fpp
.
C, C++ and Fortran object files are slightly different from each other, but can be linked together to form executables from multiple languages. It is actually quite common for C++ programs to use C libraries, and for C/C++ programs to use Fortran libraries such as BLAS.
C++ programs can be compiled on any system using the c++ command, which is equivalent to clang++ on FreeBSD and macOS, and to g++ on GNU/Linux systems. There is rarely a reason to invoked clang++ or g++ explicitly.
# C++ program using Fortran BLAS library and C math library shell-prompt: c++ super-analyzer.cxx -L/usr/local/lib -lblas -lm
The most stable Fortran compiler is gfortran. The Flang project aims to develop an open source Fortran compiler companion to clang and clang++, but is still a work-in-progress at the time of this writing. Commercial Fortran compilers also exist. At the time of this writing, Fortran compilers are not as compatible with each other as are clang and gcc, so invoking gfortran directly may be the best option.
What are the three stages in C, C++, and Fortran compilation?
What is the portable way to invoke a C, C++, and Fortran compiler? Contrast to the non-portable ways.
What is the risk of using optimizations like
-march=native
?
Show a portable command that compiles the program
find-waves.c
to an executable called
find-waves
. The program uses functions from
the C math library. Use the best safe and portable optimizations.
Show a compile command that uses GNU Fortran to build
find-waves
from
find-waves.f90
. The Fortran version of this
program uses the library
/usr/local/lib/libblas.so
.
Use the best safe and portable optimizations.