Table of Contents
No matter what programming language you use, the data types chosen by the program will have a profound effect on performance, memory use, and correctness of the output. Many languages, such as Matlab, are typeless, which means we do not control that data types of our variables. Instead, they choose a type that is likely (but not guaranteed) to have enough range and precision. This often results in a performance penalty and/or greater memory use, as they end up playing it safe to minimize the chance of overflow and round-off error.
C, C++, and Fortran give us full control over the data types of our variables and constants. Choosing data types is important in order to maximize the speed of your programs, minimize memory use, and ensure correct results. The approach to choosing a data type is basically the same in any programming language. We simply need to understand the limitations of the data types offered by the language and how they are supported by typical hardware.
Choosing the best data type always involves understanding the needs of the specific code you are working on. There are, however a few general ideas that can be applied. It will often depend on whether the variable is a scalar (dimensionless, holds only one value) or a large array (at least one dimension such as a vector or matrix, holds multiple values). This determines whether we need to consider how much memory the variable uses.
Some general guidelines to apply to each specific situation:
Use integers instead of floating point whenever possible. They are faster and more precise. Recall from the section called “Floating Point” that floating point numbers are stored in a format like scientific notation and hence floating point addition takes about three times as long as integer addition. the section called “Monte Carlo Experiments” shows an example program that runs about 2.5 times as fast when implemented with integers than when using floating point.
Integers are more precise because all the bits are used to represent digits, whereas a floating point value with the same number of bits uses some of them for the exponent, which does not contribute to precision. Only the bits used for the mantissa provide significant figures in our real numbers.
We can often eliminate the need for floating point by simply choosing smaller units of measurement, so that we no longer have fractions in our data. For example, specifying monetary amounts in cents rather than dollars, or internally representing probabilities as values from 0 to 100 rather than 0 to 1, allows us to use integers instead of real numbers.
If you must use floating point, use a 64-bit floating point value unless you need to conserve memory use (i.e. your program uses large arrays or other in-memory data structures) and you do not need much precision. Modern computers do not take significantly longer to process 64-bit floating point values than they do to process 32-bit floating point values.
Keep in mind that the accuracy of your results is usually less than the accuracy you start with, since round-off error accumulates during calculations. 32-bit floating point operations are only accurate to 6 or 7 decimal digits, so results will likely be less accurate than that. 64-bit floating point has up to 16 decimal digits of precision.
Among integer types, choose the fastest type that provides the necessary range for your computations. Larger integer types will provide greater range, but may require multiple precision arithmetic on some CPUS. Very small types like 8 bit integers may be promoted to larger integers during calculations, which will also slow down the code.
The data types offered by C and Fortran generally correspond to what is supported by typical hardware, usually 8, 16, 32, and 64-bit integers, and 32 and 64-bit floating point numbers. Most compilers support 64-bit integers on 32-bit CPUs, in which case operations like addition and subtraction will take twice as long, since the CPU has to use two 32-bit machine instructions. This is called multiple precision arithmetic. Likewise, some compilers support 128-bit integers and floating point values, which will require multiple machine instructions on 64-bit hardware. If you require even bigger numbers than this, you will need to use a multiple-precision library functions, which will be significantly slower than hardware-supported arithmetic operations.
Integers are usually represented internally in unsigned binary or two's complement signed integer format, since these formats are directly supported by most hardware.
Floating point types, which approximate real numbers, are represented using IEEE standard floating point format in modern CPUs. There are some CPUs that do not have floating point support at the hardware level, but such CPUs are only generally used in small embedded applications and not for scientific computing.
Character types are generally processed internally as 8-bit unsigned integers representing the binary ISO character code. Some systems support 16 and 32-bit ISO character sets as well, depending on the locality selected in the operating system configuration. The 16 and 32-bit codes are only needed for non-alphabetic languages such as Chinese.
Why is data type selection important?
What do we need to know in order to select the optimal data type for a variable?
Why should we use integers rather than floating point whenever possible?
What if the data require the use of fractions, as with probabilities, which are always between 0.0 and 1.0? We have to use floating point, right?
When should we use 32-bit floating point and when should we use 64-bit floating point. Why?
What are the pros and cons of using complex data types?
What is the general approach for selecting an integer type?