Performance

Optimizing loops comes down to two things:

There are many ways to achieve these two goals, and you should use your imagination to find as many ways as possible. A few examples are outlined below.

Usually, the best way to minimize the number of iterations is by using a better algorithm. For example, a selection sort performs roughly N2 iterations to sort a list of N elements, while a heap sort, quick sort, and merge sort use roughly N * log(N), a much smaller number when N is large.

In some cases, we may be able to eliminate the entire loop and replace it with a direct calculation. For example, we might be able to solve f(x) = 0 directly rather than use Newton's method to iteratively find a solution.

To speed up loops, remove unnecessary code inside the loop. Sometimes, code inside a loop can simply be moved out of the loop, so that it is executed once rather than repeatedly.

// Some bad code
#define MAX 1000000000

for (c = 0; c < MAX; ++c)
{
    /*
     *  This if check is executed a billion times, but it doesn't
     *   need to be executed at all
     */
    if ( c == 0 )
    {
        // case 1
    }
    else
    {
        // case 2
    }
}
        
// Some better code
// case 1

for (c = 1; c < MAX; ++c)
{
        // case 2
}
        

If you see an if inside a loop, be suspicious. Very often, the code can be restructured to eliminate the overhead of conditionals inside a loop.

// Some bad code
#define MAX 1000000000

for (c = 0; c < MAX; ++c)
{
    /*
     *  This if check is executed a billion times, but it doesn't
     *   need to be executed at all
     */
    if ( c < SOME_VAL )
    {
        // case 1
    }
    else
    {
        // case 2
    }
}
        
// Some better code
// case 1

for (c = 0; c < SOME_VAL; ++c)
{
        // case 1
}

for (c = SOME_VAL+1; c < MAX; ++c)
{
        // case 2
}
        

Sometimes, doing the opposite of the above example can speed up the code. If we can combine small loops that iterate over the same range, without adding a conditional, we will eliminate duplicated loop overhead.

If a loop is extremely complex and uses large amounts of memory far beyond the size of cache, it might benefit from being broken into smaller pieces that each use less memory. This will reduce the average memory access time.

Use the performance tips covered in previous sections, such as using integer variables instead of floating point. If you must use if statements inside a loop, make sure the condition is false more often than true. These optimization techniques have their greatest impact inside the most intensive loops, where their benefit may be repeated billions of times.

A loop that checks the condition at the end is actually slightly faster than one that checks at the beginning. This is due to the way the code is translated to machine language. A while or for condition is converted to a machine instruction that performs the converse comparison and jumps past the end of the loop if it is true. At the end of the loop, we then need an unconditional branch instruction to jump back to the start.

    c = 0;
    while ( c < 10 )
    {
        // Body
        ++c;
    }
        
        // We need c and 10 in registers for the bge instruction
        ld      t0, c
        li      t1, 10
loop:   bge     t0, t1, done    // while ( c < 10 )
        
        // Body
        
        addi    t0, t0, 1       // ++c
        j       loop            // Extra overhead
done:   
        

A do-while does not need this extra jump instruction:

    c = 0;
    do
    {
        // Body
        ++c;
    }   while ( c < 11 );
        
        // We need c and 10 in registers for the bge instruction
        ld      t0, c
        li      t1, 11
loop:   // Body

        addi    t0, t0, 1
        blt     t0, t1, loop