When performance is a concern, use a purely compiled language. Interpreted programs are run by another program called an interpreter. Even the most efficient interpreter spends more than 90% of its time parsing (interpreting) your code and less than 10% performing the useful computations it was designed for. Most spend more than 99% of their time parsing and less than 1% running. All of this wasted CPU time is incurred every time you run the program.
Note also that when you run an interpreted program, the interpreter is competing for memory with the very program it is running. Hence, in addition to running an order of magnitude or more slower than a compiled program, an interpreted program will generally require more memory resources to accommodate both your program and the interpreter at the same time.
With a compiled program, the compiler does all this parsing ahead of time, before you run the program. You need only compile your program once, and can then run it as many times as you want. Hence, compiled code tends to run anywhere from tens to thousands of time faster than interpreted code.
For interactive programs, maximizing performance is generally not a major concern, so little effort goes into optimization. Users don't usually care whether a program responds to their request in 1/2 second or 1/100 second.
In High Performance Computing, on the other hand, the primary goal is almost always to minimize run time. Most often, it's big gains that matter - reducing run time from months or years to hours or days. Sometimes, however, researchers are willing to expend a great deal of effort to reduce run times by even a few percent.
There is a middle class of languages, which we will call pseudo-compiled for lack of a better term. The most popular among them are Java and Microsoft .NET languages. These languages are "compiled" to a byte code that looks more like machine language than the source code. However, this byte code is not the native machine language of the hardware, so an interpreter is still required to run it. Interpreting this byte code is far less expensive than interpreting human-readable source code, so such programs run significantly faster than many other interpreted languages.
In addition to pseudo-compilation, some languages such as Java include a Just-In-Time (JIT) compiler. The JIT compiler converts the byte code of a program to native machines language while the program executes. This actually makes the interpreted code even slower the first time it executes each statement, but it will then run at compiled speed for subsequent iterations. Since most programs contain many loops, the net effect is program execution closer to the speed of compiled languages.
Table 13.2, “Selection Sort of 200,000 Integers” shows the run time (wall time) of a selection sort program written in various languages and run on a 2.9GHz Intel i5 processor under FreeBSD 13.0. FreeBSD was chosen for this benchmark in part because it provides for simple installation of all the latest compilers and interpreters, except for MATLAB, which is a commercial product that must be installed manually along with a Linux compatibility module.
Table 13.2. Selection Sort of 200,000 Integers
|Language (Compiler)||Execution method||Time (seconds)||Memory|
|C (clang 11)||Compiled||7.8||11 MB (2.7 resident)|
|C++ (clang++ 11)||Compiled||8.1||13 MB (3.9 resident)|
|C (gcc 10)||Compiled||12.3||11 MB (2.7 resident)|
|C++ (g++ 10)||Compiled||12.4||14 MB (4.54 resident)|
|Fortran (gfortran 10)||Compiled||12.4||15 MB (3.65 resident)|
|Fortran (flang 7)||Compiled||12.5||20 MB (7.11 resident)|
|D (LDC 1.23.0)||Compiled||30.5||16 MB (5.2 resident)|
|Rust 1.57||Compiled||30.6||13 MB (3.5 resident)|
|Java 11 with JIT||Quasi-compiled + JIT compiler||31.4||3,430 MB (44 resident)|
|Octave 6.4.0 vectorized (use min(list(start:list_size) to find minimum)||Interpreted (no JIT compiler yet)||34.9||345 MB (103 resident)|
|GO 1.17||Compiled||37.7||689 MB (14 resident)|
|Pascal (free pascal 3.2.2)||Compiled||43.0||2.2 MB (1.1 resident)|
|Python 3.8 with numba JIT 0.51||Interpreted + JIT compiler||44.5||305 MB (107 resident)|
|MATLAB 2018a vectorized (use min(list(start:list_size) to find minimum)||Interpreted + JIT compiler||51.7||5,676 MB (251 resident)|
|R 4.1.2 vectorized (use which.min(nums[top:last]) to find minimum)||Interpreted||97.4||1,684 MB (1,503 resident)|
|MATLAB 2018a||Interpreted + JIT compiler non-vectorized (use explicit loop to find minimum)||112.2 (1.87 minutes)||5,678 MB (248 resident)|
|Java 11 without JIT||Quasi-compiled||408.8 (6.8 minutes)||3,417 MB (38 resident)|
|Python 3.8 vectorized (use min() and list.index() to find minimum)||Interpreted||758.8 (12.6 minutes)||28 MB (18 resident)|
|Perl 5.32 vectorized (use reduce to find minimum)||Interpreted||2,040.3 (34.0 minutes)||41 MB (32 resident)|
|R 4.1.2 non-vectorized (use explicit loop to find minimum)||Interpreted||2159.5 (36.0 minutes)||120 MB (70 resident)|
|Python 3.8 non-vectorized (use explicit loop to find minimum)||Interpreted||2362.3 (39.4 minutes)||26 MB (16 resident)|
|Perl 5.32 non-vectorized (use explicit loop to find minimum)||Interpreted||2564.5 (42.7 minutes)||33 MB (23 resident)|
|Octave 6.4.0 non-vectorized (use explicit loop to find minimum)||Interpreted (no JIT compiler yet)||80096.0 (1334.9 minutes, 22.2 hours)||345 MB (103 resident)|
This selection sort benchmark serves to provide a rough
estimate of the relative speeds of languages when you
use explicit loops and arrays. There are, of course,
better ways to sort data. For example, the standard
C library contains a
function which is far more efficient than selection
sort. The Unix sort command
can sort the list in less than 1 second on the same
machine. Selection sort was chosen here because
it contains a typical nested loop representative of
many scientific programs.
All compiled programs were built with standard optimizations. Run time was determined using the time command and memory use was determined by monitoring with top. The code for generating these data is available on Github.
Memory is allocated for programs from a pool of virtual memory, which includes RAM (fast electronic memory) + an area on the hard disk known as swap, where blocks of data from RAM are sent when RAM is in short supply. The first number shows the virtual memory allocated, and the second shows the amount of data that resides in RAM. Both numbers are important.
Programs that run 100 times as fast don't just save time, but also extend battery life on a laptop or handheld device and reduce your electric bill and pollution. The electric bill for a large HPC cluster is thousands of dollars per month, so improving software performance by orders of magnitude can have a huge financial impact.
Most interpreted languages offer a variety of built-in features and functions, as well as external libraries that execute operations on vectors (arrays, matrices, lists, etc) at compiled speed (because they are written in compiled languages). These features and functions use compiled loops under the hood rather than explicit interpreted loops. Some programmers may not even realize that they are using a loop, but most operations that process multiple values require a loop at the hardware level, whether or not it is visible in the source code.
For example, MATLAB allows us to perform many operations on entire vectors or matrices without writing an explicit loop:
# MATLAB explicit interpreted loop to initialize an array for c = 1:100000 list1(c) = c; end # MATLAB vectorized list initialization # Achieves exactly the same result, but faster than interpreted loop above list1 = [1:100000];
An another example, many interpreted languages provide a min() function for finding the smallest element in a list. This function may be written in a compiled language, in which case it will be many times faster than an explicit interpreted loop that searches the list. This, along with the MATLAB example above, are examples of vectorizing code, i.e. using built-in vector operations instead of explicit loops.
If your interpreted program does most of its computation using compiled functions and vector operations, it may run several times faster than the same program using explicit interpreted loops. However, it will still not approach the speed of a compiled program, and only tasks that can utilize the finite set of compiled functions and vector features will perform well. Not every array and matrix operation can be accomplished using built-in functions or vector operations. When you have to use an explicit interpreted loop, it will be very slow.
This often leaves you with no good options when dealing with large amounts of data. In order to vectorize an operation, the program must inhale all of the data into an array or similar data structure. While this will speed up the code by replacing an interpreted loop with a compiled one, it also slows down memory access by overflowing the cache, and in some cases could even cause memory exhaustion, where the system runs out of swap. In that scenario, the program simply cannot finish.
Arrays are often not inherently necessary to implement the algorithm. For example, adding two matrices stored in files and writing the sum to a third file, does not require reading the matrices into arrays. If we can process one value at a time without using an array, the code will be simpler, and not limited by memory capacity. However, using an explicit loop in an interpreted language in order to avoid using arrays will reduce performance by orders of magnitude. The only option to maximize speed and minimize memory use may be rewriting the code in a compiled language. If using a compiled language, it is almost always preferable not to use arrays, unless there is a block of data that must be accessed repeatedly.
As an example, the R version of the selection sort program
was written in two ways: One with entirely explicit loops
(the slowest approach with an interpreted language)
and the other using an intrinsic function,
which.min() function uses a compiled
loop behind the scenes to find the smallest element in
the list, the most expensive part of the selection
sort algorithm. As you can see in
Table 13.2, “Selection Sort of 200,000 Integers”, this leads to a dramatic
improvement in speed, but still does not bring R close
to the speed and memory efficiency of a compiled
a subset of the list also has
a trade-off in that it vastly increases memory use.
Vectorized variants of the MATLAB, Python and perl sort scripts were also tested, with varying results. Vectorizing Python and perl did not result in nearly as much performance improvement, but on the bright side, the memory trade-off was negligible.
Choosing a compiled language is very important to execution speed, but is not the whole story. It will not guarantee that your code is as fast as it could be, although it will be far less slow than the same algorithm implemented in an interpreted language. Choosing the best algorithms can be actually far more important in some cases. For this reason, computer science departments focus almost exclusively on algorithms when teaching software performance.
Selection sort, for example, is an O(N2) algorithm, which means that the run time of a selection sort program is proportional to N2 for a list of N elements. I.e., the run time is N2 times some constant KC-selection.
Heap sort is an O(N * log(N)) algorithm, which means that the execution time of the heap sort in Python is proportional to N * log(N) for a list of N elements:
If N is large, say 109, then N2 is 1018, whereas N * log(N) is 9 * 109, or approximately 1010. Hence, the heap sort will be about 10 billion times as fast as selection sort for a list of a billion values. Obviously this is more important than the 100-fold speedup we get from a compiled language. Using the best algorithms and a compiled language is clearly the best way to maximize performance.
You may encounter criticism of compiled languages for being harder to use than interpreted languages. This is another opportunity for a Socratic examination. If you ask the people telling you this to clarify with some specific examples, you will usually find that they don't know very much about the compiled languages they fear.
For example, many MATLAB users believe that MATLAB is easier because it has "built-in" matrix capabilities. However, most common vector and matrix operations in MATLAB are readily available to C, C++, and Fortran programs in free libraries such as BLAS and LAPACK. In fact, MATLAB uses these same libraries for many of its calculations.
It is often (erroneously) stated that interpreted languages are easier to use because you don't have to compile the program before you run it. I struggled for years trying to understand the origin of this myth. Asking people to explain only revealed their ignorance on the subject, as Socrates often experienced.
I know that compiling programs is no harder than interpreting programs, because I have been doing both, using many languages, for decades. My best guess regarding why people believe this, aside from accepting hearsay on faith, is that there are many existing open source projects with seriously brain-damaged build systems, that most people struggle to install. However, this is in no way the fault of the language. There is no reason that a build system (such as a makefile, discussed in Chapter 21, Building with Make) should be difficult to use. This is entirely on the developer who wrote it. Furthermore, if you install programs via a package manager, difficulties with building the program are not your concern. Any problems will have been resolved by the package manager.
The fact that Random Dude made a mess of his compiled code and build system does not mean that you will have a hard time programming in a compiled language. Go about it intelligently, and it will not be much different than using an interpreted language.
The pool of different programming languages in existence today is insurmountable and growing at an accelerating rate. As a result, it is becoming harder and harder to find programmers with skills in a specific language.
The rational approach to overcoming this situation is to develop good general programming skills that will apply to most languages. Stay away from proprietary or esoteric language constructs, tools, and techniques, and focus on practices that will be useful in many languages and on many operating systems.
To develop depth of knowledge, focus on mastering one general-purpose scripting language and one general-purpose compiled language. The skills you develop this way are more likely to serve you directly, but will also help you conquer other languages more easily if it proves necessary.
Don't be taken in by the false promises of "higher level" languages. The reality is that programming doesn't get much easier than it is in C or Fortran. Other languages may seem easier to the beginning programmer, but as you progress you will find that overcoming their limitations is impossible or at least more difficult than coding in C.
If you only know how to use interpreted languages, you will have access to a wide range of high performance code written by others, such as the C and Fortran code behind Matlab, Octave, and Python libraries such as Numpy. You will not, however, be able to create your own high performance code. To do that you should learn C (any maybe later expand into C++) or Fortran. Other compiled languages exist, but the vast majority of existing code is written in C, C++, and Fortran. You will not always be writing new code from scratch, but will often find a need to modify or improve existing code instead.
Which runs faster, a program written in a compiled language or the same program in an interpreted language? Why? By how much?
Which uses less memory, a program written in a compiled language or the same program in an interpreted language? Why? By how much?
Which is better for an interactive program that doesn't do much computation, a compiled language, or an interpreted language?
Which is better for an interactive program that does heavy computation, a compiled language, or an interpreted language?
Is Java compiled or interpreted? Explain.
List four advantages of a program that runs faster.
What does it mean to vectorize code?
What is one advantage and one disadvantage of vectorizing? Explain.
If you need to improve performance of an interpreted language program, but don't want to increase memory use, what can you do?
Does using a compiled language ensure that your programs will be fast? Explain.
Are compiled languages inherently harder to use than interpreted languages? Explain.
How can one cope with the enormous number of different programming languages available today?