Optimizing loops comes down to two things:
Minimizing the number of iterations, or eliminate the loop entirely by using better algorithms.
Minimizing the run time of each iteration by stripping down and optimizing the body of the loop as much as possible.
There are many ways to achieve these two goals, and you should use your imagination to find as many ways as possible. A few examples are outlined below.
Usually, the best way to minimize the number of iterations is by using a better algorithm. For example, a selection sort performs roughly N2 iterations to sort a list of N elements, while a heap sort, quick sort, and merge sort use roughly N * log(N), a much smaller number when N is large.
In some cases, we may be able to eliminate the entire loop and replace it with a direct calculation. For example, we might be able to solve f(x) = 0 directly rather than use Newton's method to iteratively find a solution.
To speed up loops, remove unnecessary code inside the loop. Sometimes, code inside a loop can simply be moved out of the loop, so that it is executed once rather than repeatedly.
// Some bad code #define MAX 1000000000 for (c = 0; c < MAX; ++c) { /* * This if check is executed a billion times, but it doesn't * need to be executed at all */ if ( c == 0 ) { // case 1 } else { // case 2 } }
// Some better code // case 1 for (c = 1; c < MAX; ++c) { // case 2 }
If you see an if
inside a loop, be suspicious.
Very often, the code can be restructured to eliminate the
overhead of conditionals inside a loop.
// Some bad code #define MAX 1000000000 for (c = 0; c < MAX; ++c) { /* * This if check is executed a billion times, but it doesn't * need to be executed at all */ if ( c < SOME_VAL ) { // case 1 } else { // case 2 } }
// Some better code // case 1 for (c = 0; c < SOME_VAL; ++c) { // case 1 } for (c = SOME_VAL+1; c < MAX; ++c) { // case 2 }
Sometimes, doing the opposite of the above example can speed up the code. If we can combine small loops that iterate over the same range, without adding a conditional, we will eliminate duplicated loop overhead.
If a loop is extremely complex and uses large amounts of memory far beyond the size of cache, it might benefit from being broken into smaller pieces that each use less memory. This will reduce the average memory access time.
Use the performance tips covered in previous sections, such as
using integer variables instead of floating point. If you must
use if
statements inside a loop, make sure the
condition is false more often than true. These optimization
techniques have their greatest impact inside the most intensive
loops, where their benefit may be repeated billions of times.
A loop that checks the condition at the end is actually slightly
faster than one that checks at the beginning. This is due to the
way the code is translated to machine language. A while
or for
condition is converted to a machine instruction
that performs the converse comparison and jumps past the end of
the loop if it is true. At the end of the loop, we then need an
unconditional branch instruction to jump back to the start.
c = 0; while ( c < 10 ) { // Body ++c; }
// We need c and 10 in registers for the bge instruction ld t0, c li t1, 10 loop: bge t0, t1, done // while ( c < 10 ) // Body addi t0, t0, 1 // ++c j loop // Extra overhead done:
A do-while
does not need this extra jump instruction:
c = 0; do { // Body ++c; } while ( c < 11 );
// We need c and 10 in registers for the bge instruction ld t0, c li t1, 11 loop: // Body addi t0, t0, 1 blt t0, t1, loop