Chapter 16. Introduction to High-Performance Programming

Chapter 16. Introduction to High-Performance Programming
Prev	Part III. High Performance Programming	Next

Two for the Price of One

You may be thinking: C and Fortran? At the same time? Are you insane??? I assure you, I am somewhat sane. After years of teaching computer programming and assuming that learning two languages at once would be a calamity, one day by chance I questioned this belief in a daydream.

It occurred to me that this is actually the solution to some major problems. People who only know one language don't really understand the separation between design and implementation, e.g. the concept of a loop and the syntax of a loop. They also fear learning a second language, not realizing that learning the second one is an order of magnitude easier than the first, because you already know the concepts, which are mostly the same across all languages.

Learning two languages syntaxes at once immediately clarifies the concepts underlying both of them by providing context, and alleviates many fears about future learning curves. So, I combined my notes from previous courses in C and Fortran, gave it a high-performance software spin, and here we are.

Why C and Fortran?

You may wonder why this book focuses on C and Fortran, given that your colleagues are mostly using Matlab, or Perl, or Python, or R. All of these languages are useful, but as we discuss in the section called “Compiled vs Interpreted Languages”, they are generally an order of magnitude or two slower than C. They are also someone domain-specific, with Matlab being used mainly in engineering, R mainly in statistics, and Python in a variety of areas where specific tools provide a Python interface, such as machine learning.

The goal of this text is to empower the reader as broadly as possible. Given that C, C++, and Fortran are many times faster than interpreted languages, it is likely that you'll need to learn one of them anyway, when you cannot get the performance you need from other languages.

Knowing the simple combination of shell scripting and C, you can accomplish anything. If something cannot be done conveniently or efficiently enough in a shell script, you can create a C program and run it from your shell script. People often need to do the same when programming in Matlab, Perl, Python, R, or other interpreted languages.

Given the facts stated above, we focus on scripting and the most common and best performing compiled languages, and leave the many other programming languages to more domain-specific training.

C and C++

C and it's offshoot, C++ are the most popular compiled programming languages of recent decades.

C is a very simple, but high-level language that is renowned for giving the programmer unlimited power and flexibility, near optimal execution speed, and maximum code density, the amount of functionality per line of code. The creators of C deliberately made it a minimalist, though complete, high level language, by adhering to one rule: Don't add any features to the language that can be implemented by a function (subprogram). The result was a language with only about 30 keywords, no built-in input/output statements, and no mathematical functions.

Because the language is simple, it is easy for anyone to master the language itself. Most of the functionality in C programs comes from libraries, collections of subprograms that are not part of the C compiler, and are mostly written in C. Learning new library functions is easier than learning new language features, because the rules for calling any library function are the same. Once you know these simple rules, learning a library function is just a matter of knowing what it does, what to pass to it (the arguments) and what it sends back (the return value).

Another benefit of the simplicity of C is that the compiler can produce the fastest possible machine code. The programmers developing the C compiler itself don't have a lot of work do to making the compiler correct, and hence can spend more time on optimizing it. This also means that C programs compile very quickly compared to more complex languages. A typical C compiler can compile 10,000 lines of code in a few seconds.

Code written in C can be easily utilized by code written in other popular languages such as C++, Fortran, MATLAB, Perl, Python, and R, which makes C a good choice for writing general-purpose computational code, especially libraries. Hence, putting functionality into C libraries maximizes not only performance, but accessibility from virtually any language. Creating libraries is relatively simple and is covered in the section called “Creating your own Libraries”. Code added to a C library will never need to be duplicated, since anyone can use it from virtually any programming language. Typically, about 2/3 of all the C code I write ends up in libraries and only 1/3 is limited to a particular application.

C was first invented around 1970 and was improved in some important ways over the next few decades. It has changed very little since the 1990s, however. The fact that C is now stable also means that programs written in C will require minimal maintenance for years to come. Many other popular languages are still evolving rapidly. Some C++, MATLAB, Python, and R code written 10 years ago no longer works with the latest compilers or interpreters. Perl 6 is a drastically different language than perl 5. The changes needed are usually small, but a fair amount of expertise is required to make them. All changes to programs are time-consuming and require a new round of testing, as they almost invariably introduce new bugs.

Critics of C will often state that it is obsolete because it is not an object-oriented language. However, object-oriented programming is a design discipline, not a language feature. It is possible to implement object-oriented programs and any language, and in fact not at all difficult in C. The only features required for basic OOP (object-oriented programming) are structures and typedefs. This is discussed in more detail in Chapter 30, Object Oriented Programming. You can find plenty of information on the web about doing object-oriented programming in C (OOP-in-C). It will be discussed to some extent in later chapters.

Conversely, it is both possible and common to write non-object-oriented code in an object-oriented language. There exist many programmers who do not understand object-oriented design and believe that they are doing object-oriented programming simply because they are using an object-oriented language such as C++ or Java. This is a non-sequitur.

Many features of object-oriented languages, such as multiple inheritance, friend classes, and delegating copy constructors, are not essential to object-oriented design. They are more about convenience in implementation. Such features allow you to reduce code size in some situations, in exchange for a much higher learning curve and code complexity.

C++ is a superset of C, which means that C source code can be used directly in C++ programs. Unlike C, C++ is an extremely complex language which has drawn sharp criticism from some big names in computer science. (See https://en.wikipedia.org/wiki/C%2B%2B.) It contains many advanced features of questionable value, so many in fact, that very few C++ programmers fully understand the language. As a result, most C++ programmers use only a subset of its features. Different programmers use different subsets and often have a hard time understanding each others' code.

Porting C++ code from one platform or compiler to another often runs into trouble because of this complexity as well. Before deciding to implement a project in C++, ask yourself if the features of the language are really going to save you more time and effort than the complexity of the language will cost you. As most scientific software does not use complex data structures, you may be better off using plain C and a little self-discipline in cases where you want to maximize performance.

A little knowledge is a dangerous thing. This age-old adage is especially true in C++ development. In the words of Mr. Miyagi, "Karate do, or karate no do. Karate maybe so, and (sound of neck breaking)." I advise against using C++ unless you are prepared to devote the enormous amount of time required to learn it well. If you are not, then you will struggle and produce low-quality code, which is the last thing the scientific computing community needs more of. A good college student can master C in one semester. C++ will require three or four.

You may decide to learn C++ in order to utilize an existing library that is written in C++. Sometimes suitable alternatives are available to use from C or Fortran. BLAS and LAPACK can be used from any language, but Eigen, which has similar functionality requires C++. If you require a function of Eigen that is not available in any other library, then you either have to write an equivalent function yourself or learn C++.

Even if using C++ for a given project is preferable, this does not mean that you have to do all your coding in C++. If you develop your own libraries, you may choose to use C in order to maximize their accessibility from other languages, including C++, and not force others to tackle the complexities of C++ in order to use them.

This introduction will focus on C in order to help you become proficient in a compiled language as quickly as possible. Since C is a subset of C++, everything discussed here will directly benefit those who want to continue on and learn C++ eventually. Most readers will find that C is more than adequate for their needs and will be better off focusing on mastering general programming skills rather than spending that time learning the features of more complex languages.

Not Your Grandfather's Fortran

Fortran was the first widely available high level language, originally created in the 1950's. If you tell people, especially computer scientists, that you're learning Fortran, you may get some odd looks and snide remarks such as "I thought all the Fortran programmers were dead", or "Wow, I bet you had a pet mastodon when you were little".

What you're learning here is not your grandfather's Fortran, however. Fortran has gone through a number of major evolutionary steps, and is greatly enhanced since the early days. Fortran 90 brought some particularly important improvements, such as free-format (Fortran versions up to 77 requires a strict line structure) and more support for structured code. Fortran has always had intrinsic support for complex numbers, and newer versions support many matrix operations like those found in MATLAB and other higher-level tools.

Fortran is a compiled language, so well-written Fortran programs run about as fast as any program could, and nearly as fast a C in many cases.

Fortran is an open standard language, so there are many compilers available from multiple vendors. There are also free Fortran compilers for most common computer hardware and operating systems.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.

What are two benefits of learning two languages at once?
What are three benefits of the simplicity of C?
What is the benefit of a stable language, i.e. one that is not evolving rapidly?
Can we do object-oriented programming in C?
Is learning C a waste of time for those who need to use C++?
What kind of performance can you expect from well-written Fortran programs?

Prev	Up	Next
Character Storage	Home	C Program Structure