Compiled vs Interpreted Languages

Language Performance

When performance is a concern, use a purely compiled language. Interpreted programs are run by another program called an interpreter. Even the most efficient interpreter spends more than 90% of its time parsing (interpreting) your code and less than 10% performing the useful computations it was designed for. Most spend more than 99% of their time parsing and less than 1% running. All of this wasted CPU time is incurred every time you run the program.

Note also that when you run an interpreted program, the interpreter is competing for memory with the very program it is running. Hence, in addition to running an order of magnitude or more slower than a compiled program, an interpreted program will generally require more memory resources to accommodate both your program and the interpreter at the same time.

With a compiled program, the compiler does all this parsing ahead of time, before you run the program. You need only compile your program once, and can then run it as many times as you want. Hence, compiled code tends to run anywhere from tens to thousands of time faster than interpreted code.

For interactive programs, maximizing performance is generally not a major concern, so little effort goes into optimization. Users don't usually care whether a program responds to their request in 1/2 second or 1/100 second.

In High Performance Computing, on the other hand, the primary goal is almost always to minimize run time. Most often, it's big gains that matter - reducing run time from months or years to hours or days. Sometimes, however, researchers are willing to expend a great deal of effort to reduce run times by even a few percent.

There is a middle class of languages, which we will call pseudo-compiled for lack of a better term. The most popular among them are Java and Microsoft .NET languages. These languages are "compiled" to a byte code that looks more like machine language than the source code. However, this byte code is not the native machine language of the hardware, so an interpreter is still required to run it. Interpreting this byte code is far less expensive than interpreting human-readable source code, so such programs run significantly faster than many other interpreted languages.

In addition to pseudo-compilation, some languages such as Java include a Just-In-Time (JIT) compiler. The JIT compiler converts the byte code of a program to native machines language while the program executes. This actually makes the interpreted code even slower the first time it executes each statement, but it will then run at compiled speed for subsequent iterations. Since most programs contain many loops, the net effect is program execution closer to the speed of compiled languages.

Table 14.2, “Selection Sort of 100,000 Integers on AMD Opteron” shows the run times of a selection sort program written in various languages, and run on a 3.1GHz AMD Opteron 4386 processor under FreeBSD. Note that the relative performance of programs is highly dependent on the specific hardware. For this reason, the same benchmark results from an Intel i7-3770 processor are also shown in Table 14.3, “Selection Sort of 100,000 Integers on Intel i7”. Note how different relative performance of various C and C++ compilers and array indexing methods is for the i7 and the Opteron. The latest detailed results of this benchmark are available at https://github.com/outpaddling/Lang-speed/tree/master/Results.

FreeBSD was chosen for this benchmark in part because it provides for simple installation of all the latest compilers and interpreters, except for MATLAB, which is a commercial product that must be installed manually along with a Linux compatibility module.

Note

FreeBSD's Linux compatibility is not an emulation layer. It is a kernel module that directly supports Linux system calls where they differ from FreeBSD's. There is no performance penalty for running Linux binaries on FreeBSD. In fact, Linux binaries sometimes run slightly faster on FreeBSD than on Linux.

All compiled programs were built with typical safe optimizations, e.g. -O2 for C, C++, and Fortran. Run time was determined using the time command and memory use was determined by monitoring with top. The code for generating these data, along with full output from selected runs, is available on Github.

Table 14.2. Selection Sort of 100,000 Integers on AMD Opteron

Compiler/InterpreterExecution methodIndexingTime (seconds)Peak memory
C clang18Compiledsubscripts8.992.8MiB
C clang18Compiledpointers4.692.8MiB
C gcc14Compiledsubscripts7.912.7MiB
C gcc14Compiledpointers6.482.7MiB
C++ clang++18Compiledsubscripts7.683.9MiB
C++ clang++18Compiledpointers4.844.0MiB
C++ clang++18Compiledvectors7.674.0MiB
C++ g++14Compiledsubscripts8.094.5MiB
C++ g++14Compiledpointers6.634.6MiB
C++ g++14Compiledvectors8.164.6MiB
Fortran gfortran14Compiledsubscripts6.453.9MiB
rust-1.81.0Compiledsubscripts7.483.2MiB
go-1.21.13Compiledsubscripts9.529.8MiB
java-1.8.0-jitJITsubscripts9.9968.7MiB
python-3.11.10+numbaJITsubscripts16.64136.8MiB
python-3.11.10-vectorsInterpretedsubscripts262.2511.4MiB
python-3.11.10-loopsInterpretedsubscripts463.7511.2MiB
perl-5.36.3-vectorsInterpretedsubscripts657.009.3MiB
perl-5.36.3-loopsInterpretedsubscripts786.507.9MiB
Octave-9.2.0-vectorsInterpretedsubscripts10.4277.5MiB
Octave-9.2.0-loopsInterpretedsubscripts33116.0075.7MiB

Table 14.3. Selection Sort of 100,000 Integers on Intel i7

Compiler/InterpreterExecution methodIndexingTime (seconds)Peak memory
C clang18Compiledsubscripts4.102.6MiB
C clang18Compiledpointers4.162.6MiB
C gcc14Compiledsubscripts2.762.6MiB
C gcc14Compiledpointers2.752.6MiB
C++ clang++18Compiledsubscripts4.223.7MiB
C++ clang++18Compiledpointers4.283.8MiB
C++ clang++18Compiledvectors4.213.8MiB
C++ g++14Compiledsubscripts2.814.3MiB
C++ g++14Compiledpointers2.804.4MiB
C++ g++14Compiledvectors2.824.4MiB
Fortran gfortran14Compiledsubscripts2.773.8MiB
rust-1.81.0Compiledsubscripts5.483.2MiB
go-1.21.13Compiledsubscripts7.139.5MiB
java-1.8.0-jitJITsubscripts7.3462.3MiB
python-3.11.10+numbaJITsubscripts10.61125.5MiB
python-3.11.10-vectorsInterpretedsubscripts156.2511.2MiB
python-3.11.10-loopsInterpretedsubscripts274.2511.1MiB
perl-5.36.3-vectorsInterpretedsubscripts448.009.2MiB
perl-5.36.3-loopsInterpretedsubscripts624.007.8MiB
Octave-9.2.0-vectorsInterpretedsubscripts9.6381.8MiB
Octave-9.2.0-loopsInterpretedsubscripts14056.0080.9MiB

This selection sort benchmark serves to provide a rough estimate of the relative speeds of languages when you use explicit loops and arrays. There are, of course, better ways to sort data. For example, the standard C library contains a qsort() function which is far more efficient than selection sort. Selection sort was chosen here because it contains a typical nested loop representative of many scientific programs.

Memory is allocated for programs from a pool of virtual memory, which includes RAM (fast electronic memory) + an area on the hard disk known as swap, where blocks of data from RAM are sent when RAM is in short supply. The portion of allocated memory that resides in RAM at any given moment is called the "resident set". Table 14.2, “Selection Sort of 100,000 Integers on AMD Opteron” shows the peak RAM use (Resident Set Side, or RSS) of the program. The virtual memory allocated is also important, but there is no simple way to measure it.

Programs that run 100 times as fast don't just save time, but also extend battery life on a laptop or handheld device and reduce your electric bill and pollution. The electric bill for a large HPC cluster can be thousands of dollars per month, so improving software performance by orders of magnitude can have a huge financial impact.

Vectorizing Interpreted Code

Most interpreted languages offer a variety of built-in features and functions, as well as external libraries that execute operations on vectors (arrays, matrices, lists, etc) at compiled speed (because they are written in compiled languages). These features and functions use compiled loops under the hood rather than explicit interpreted loops. Some programmers may not even realize that they are using a loop, but most operations that process multiple values require a loop at the hardware level, whether or not it is visible in the source code.

For example, MATLAB allows us to perform many operations on entire vectors or matrices without writing an explicit loop:

# MATLAB explicit interpreted loop to initialize an array
for c = 1:100000
    list1(c) = c;
end
            
# MATLAB vectorized list initialization
# Achieves exactly the same result, but faster than interpreted loop above
list1 = [1:100000];
            

An another example, many interpreted languages provide a min() function for finding the smallest element in a list. This function may be written in a compiled language, in which case it will be many times faster than an explicit interpreted loop that searches the list. This, along with the MATLAB example above, are examples of vectorizing code, i.e. using built-in vector operations instead of explicit loops.

If your interpreted program does most of its computation using compiled functions and vector operations, it may run several times faster than the same program using explicit interpreted loops. However, it will still not approach the speed of a compiled program, and only tasks that can utilize the finite set of compiled functions and vector features will perform well. Not every array and matrix operation can be accomplished using built-in functions or vector operations. When you have to use an explicit interpreted loop, it will be very slow.

This often leaves you with no good options when dealing with large amounts of data. In order to vectorize an operation, the program must inhale all of the data into an array or similar data structure. While this will speed up the code by replacing an interpreted loop with a compiled one, it also slows down memory access by overflowing the cache, and in some cases could even cause memory exhaustion, where the system runs out of swap. In that scenario, the program simply cannot finish.

Arrays are often not inherently necessary to implement the algorithm. For example, adding two matrices stored in files and writing the sum to a third file, does not require reading the matrices into arrays. If we can process one value at a time without using an array, the code will be simpler, and not limited by memory capacity. However, using an explicit loop in an interpreted language in order to avoid using arrays will reduce performance by orders of magnitude. The only option to maximize speed and minimize memory use may be rewriting the code in a compiled language. If using a compiled language, it is almost always preferable not to use arrays, unless there is a block of data that must be accessed repeatedly.

As an example, the R version of the selection sort program was written in two ways: One with entirely explicit loops (the slowest approach with an interpreted language) and the other using an intrinsic function, which.min(). The which.min() function uses a compiled loop behind the scenes to find the smallest element in the list, the most expensive part of the selection sort algorithm. As you can see in Table 14.2, “Selection Sort of 100,000 Integers on AMD Opteron”, this leads to a dramatic improvement in speed, but still does not bring R close to the speed and memory efficiency of a compiled language. Using which.min() with a subset of the list also has a trade-off in that it vastly increases memory use.

Vectorized variants of the MATLAB, Python and perl sort scripts were also tested, with varying results. Vectorizing Python and perl did not result in nearly as much performance improvement, but on the bright side, the memory trade-off was negligible.

Compiled Languages Don't Guarantee a Fast Program

Choosing a compiled language is very important to execution speed, but is not the whole story. It will not guarantee that your code is as fast as it could be, although it will be far less slow than the same algorithm implemented in an interpreted language. Choosing the best algorithms can be actually far more important in some cases. For this reason, computer science departments focus almost exclusively on algorithms when teaching software performance.

Selection sort, for example, is an O(N2) algorithm, which means that the run time of a selection sort program is proportional to N2 for a list of N elements. I.e., the run time is N2 times some constant KC-selection.

Heap sort is an O(N * log(N)) algorithm, which means that the execution time of the heap sort in Python is proportional to N * log(N) for a list of N elements:

If N is large, say 109, then N2 is 1018, whereas N * log(N) is 9 * 109, or approximately 1010. Hence, the heap sort will be about 10 billion times as fast as selection sort for a list of a billion values. Obviously this is more important than the 100-fold speedup we get from a compiled language. Using the best algorithms and a compiled language is clearly the best way to maximize performance.

No, Compiled Languages are Not Hard

You may encounter criticism of compiled languages for being harder to use than interpreted languages. This is another opportunity for a Socratic examination. If you ask the people telling you this to clarify with some specific examples, you will usually find that they don't know very much about the compiled languages they fear.

For example, many MATLAB users believe that MATLAB is easier because it has "built-in" matrix capabilities. However, most common vector and matrix operations in MATLAB are readily available to C, C++, and Fortran programs in free libraries such as BLAS and LAPACK. In fact, MATLAB uses these same libraries for many of its calculations.

It is often (erroneously) stated that interpreted languages are easier to use because you don't have to compile the program before you run it. I struggled for years trying to understand the origin of this myth. Asking people to explain only revealed their ignorance on the subject, as Socrates often experienced.

I know that compiling programs is no harder than interpreting programs, because I have been doing both, using many languages, for decades. My best guess regarding why people believe this, aside from accepting hearsay on faith, is that there are many existing open source projects with seriously brain-damaged build systems, that most people struggle to install. However, this is in no way the fault of the language. There is no reason that a build system (such as a makefile, discussed in Chapter 22, Building with Make) should be difficult to use. This is entirely on the developer who wrote it. Furthermore, if you install programs via a package manager, difficulties with building the program are not your concern. Any problems will have been resolved by the package manager.

The fact that Random Dude made a mess of his compiled code and build system does not mean that you will have a hard time programming in a compiled language. Go about it intelligently, and it will not be much different than using an interpreted language.

Summary

The pool of different programming languages in existence today is insurmountable and growing at an accelerating rate. As a result, it is becoming harder and harder to find programmers with skills in a specific language.

The rational approach to overcoming this situation is to develop good general programming skills that will apply to most languages. Stay away from proprietary or esoteric language constructs, tools, and techniques, and focus on practices that will be useful in many languages and on many operating systems.

To develop depth of knowledge, focus on mastering one general-purpose scripting language and one general-purpose compiled language. The skills you develop this way are more likely to serve you directly, but will also help you conquer other languages more easily if it proves necessary.

Don't be taken in by the false promises of "higher level" languages. The reality is that programming doesn't get much easier than it is in C or Fortran. Other languages may seem easier to the beginning programmer, but as you progress you will find that overcoming their limitations is impossible or at least more difficult than coding in C.

If you only know how to use interpreted languages, you will have access to a wide range of high performance code written by others, such as the C and Fortran code behind Matlab, Octave, and Python libraries such as Numpy. You will not, however, be able to create your own high performance code. To do that you should learn C (any maybe later expand into C++) or Fortran. Other compiled languages exist, but the vast majority of existing code is written in C, C++, and Fortran. You will not always be writing new code from scratch, but will often find a need to modify or improve existing code instead.

Practice

Note

Be sure to thoroughly review the instructions in Section 2, “Practice Problem Instructions” before doing the practice problems below.
  1. Which runs faster, a program written in a compiled language or the same program in an interpreted language? Why? By how much?

  2. Which uses less memory, a program written in a compiled language or the same program in an interpreted language? Why? By how much?

  3. Which is better for an interactive program that doesn't do much computation, a compiled language, or an interpreted language?

  4. Which is better for an interactive program that does heavy computation, a compiled language, or an interpreted language?

  5. Is Java compiled or interpreted? Explain.

  6. List four advantages of a program that runs faster.

  7. What does it mean to vectorize code?

  8. What is one advantage and one disadvantage of vectorizing? Explain.

  9. If you need to improve performance of an interpreted language program, but don't want to increase memory use, what can you do?

  10. Does using a compiled language ensure that your programs will be fast? Explain.

  11. Are compiled languages inherently harder to use than interpreted languages? Explain.

  12. How can one cope with the enormous number of different programming languages available today?