Optimize loops

Loops make programs run slowly. A few unoptimal lines can make an app run at a snail's pace. This article presents performance tricks for squeezing the max speed out of your code. The focus is on processing data arrays in loops. We restructure loops, rebuild function calls, fine-tune conditionals, choose fast operators, pre-calculate values and access arrays the proper way.

This article features tips for the optimization of numeric computation, repetitive execution, recursion and arrays. If your program processes data arrays too slowly, these ideas can give your app a real performance boost. On the other hand, if your application is slow at processing strings, read Optimized string handling instead.

The optimization tips are generic and apply to many programming languages. The code examples are in Visual Basic.

In this article:

Speed up the loops

Loops are often the performance bottlenecks of an application. The key to speed up the program is to make the loops run faster. Nested loops are often where the problems lie in.

Introduction

Consider an imaging application as an example. It stores pixel images in simple arrays such as Data(x,y) or data[x,y]. To read, process and save the images, it needs to process the arrays element by element, pixel by pixel. This is done in two nested loops: the y direction and the x direction. A typical piece of code looks like this:

For y = 0 to Height-1
   ' Iterate through y axis
   For x = 0 to Width-1
      ' Iterate through x axis
      process the pixel at (x,y)
      ...
   Next x
Next y

The inner code runs through each pixel, a total of x*y times, to process the pixels one after the other. As it happens, the program runs at a snail's pace. What can we do about it?

Focus on the inner loop

With nested loops, put your efforts on the innermost loop. It's the one that executes the largest number of times. This is where you usually get the most out of your efforts. Each statement and expression really makes a difference as it executes a million times. Here you can make miracles by eliminating unnecessary statements, choosing the right operators and picking the fastest data types.

Move loop condition to end-of-loop

Most programming languages have at least two types of loops. One is the pre-condition loop where your loop condition is at the start of the loop. The other is the post-condition loop where the condition is at the end.

Loop styles
1. Pre-condition loop2. Post-condition loop
Do While condition=true
   process data
Loop
Do
   process data
Loop While condition=true
Loop executes zero or more times (n≥0) Loop executes one or more times (n≥1)
Condition evaluated n+1 times Condition evaluated n times

Note on loop #2: In some languages, such as Pascal, the post-condition loop must be written with an "until" condition such as repeat..until condition=false. As far as optimization is concerned, the idea is the same.

Here's the optimization trick: If the loop should execute at least once, use a post-condition loop. What's the deal? You save one evaluation! Assume the loop runs n times. Loop #1 evaluates its condition n+1 times, but loop #2 evaluates only n times.

When to use this trick? It makes sense if the loop is nested inside another loop and execution enters the loop a large number of times. It also makes sense if evaluating the loop condition takes a considerable amount of time.

Be sure not to overuse this technique. You can't use a post-condition loop if the loop should be skipped sometimes (executed zero times). Then you must use the pre-condition loop.

Move loop invariants out of the loop

A loop invariant is an expression whose value doesn't change during the loop. In the following example, y*Width is evaluated on every iteration of the inner loop, even though its value is always the same. Width doesn't change during the loop, nor does y change within the inner loop. y does change at each iteration of the outer loop, but not before.

For y = 0 to Height-1
   For x = 0 to Width-1
      '   y*Width is invariant
      i = y*Width + x
      Process i
   Next x
Next y

To make the loop faster, move the invariant out of the inner loop:

For y = 0 to Height-1
   Temp = y*Width
   For x = 0 to Width-1
      i = Temp + x
      Process i
   Next x
Next y

Your compiler might optimize loop invariants for you. Check to see if it really does. Compile both examples and measure the performance. If the performance is the same, you're lucky since you don't need to optimize this manually.

Use pre-calculated values

This technique is a special case of loop invariants. Replace invariant expressions with pre-calculated values (constants). Consider this simple innocent-looking expression:

x = y And Not 192

What's wrong? You possibly evaluate Not 192 on every execution. You can pre-calculate the value as a constant like this:

Const z = Not 192
x = y And z

Again, your compiler might do this for you, so check it out.

Use lookup tables

If you don't have a single constant value, but a set of alternating values, you can't use the loop invariant techniques described above. You can, however, use lookup tables for the same effect. Consider this code:

x = 2 ^ (16 - (y And 15))

We assume y is anything from 0 up to the maximum integer. As you can see, x is calculated as a of power 2, the exponent varying from 1 to 16. (y And 15) varies between 0..15, so the exponent is between 1..16. This expression is relatively slow since it contains the slow exponential operator ^. It is a great case for a lookup table. You pre-calculate the values and store them in an array. Reading from the array is faster than re-evaluating the same 16 values over and over again.

Here's how you build the lookup table:

' Powers of 2 (from 2^16 to 2^1)
Dim Pow(0 To 15) As Long
For i = 0 to 15
   Pow(i) = 2 ^ (16 - i)
Next
...

' Use the lookup table
x = Pow(y And 15)
' same as x = 2 ^ (16 - (y And 15))

If you know in advance the values that y can take, you can expand the lookup table. Instead of (y And 15), calculate the table for (y). In your application, y might take values between 0..255 or 0..65535. In this case it could make sense to pre- calculate a larger table, such as Pow(0 To 255) or Pow(0 to 65535). This way you would get rid of the And 15 part and save some calculation in your loops.

Expand loops

Some loops run faster if you write them out. Consider this simple loop:

For y = 0 to 7
   i(y) = z And 2^y
Next

You can expand this short loop as:

i(0) = z And 1
i(1) = z And 2
i(2) = z And 4
i(3) = z And 8
i(4) = z And 16
i(5) = z And 32
i(6) = z And 64
i(7) = z And 128

This is a speed vs. code size issue. It can make a big difference if the formula is slow. Rewriting it "open" (instead of re-evaluating the formula over and over) can make the code a lot faster. The expression above may be too simple to justify expansion. To make the point clear, consider this artificial example:

For y = 0 to 7
   i(y) = j(0) And 2^y   _
      + j(1) And 2^(y-1) _
      + j(2) And 2^(y-2) _
      + j(3) And 2^(y-3)
Next

Executing this line in VB6 will take a long time. Expanding it using the above technique will boost the performance considerably. You expand each of 2^y, 2^(y-1), 2^(y-2) and 2^(y-3). Since you know these values in advance, you don't need to re-calculate them over and over.

Speed up operations on data

Eliminate unnecessary temporary variables

Programs often store intermediate results in a temporary variable before using it further. Here is a typical example:

For y = 0 to Height-1
   For x = 0 to Width-1
      temp = calculate(x, y)
      Process temp
   Next x
Next y

Many times you can make the program faster by eliminating the temporary variable. You save the effort of unnecessarily storing the value:

For y = 0 to Height-1
   For x = 0 to Width-1
      Process calculate(x, y)
   Next x
Next y

The benefits of doing this vary by the language. Some compilers automatically optimize this for you, so removing temp manually makes no difference.

Another typical example is this:

For y = 0 to Height-1
   For x = 0 to Width-1
      result = calculate(x, y)
   Next x
Next y

Here you don't actually use the value of result for anything. For all your purposes, result is effectively dead. It is only consuming system resources. Again, some compilers will eliminate such an unused variable on your behalf. For languages that don't have this feature, such as the good old Visual Basic, you can use a code analyzer (see Project Analyzer) to locate the dead variables for you.

Choose faster operators

Some mathematical operators take more to run than others. Testing the actual performance in your environment and with your data types is recommended. Here are some examples in VB6:

OperatorSlowerFasterComment
Exponentialx ^ 2
x ^ 3
x ^ 4
x * x
x * x * x
x * x * x * x
The exponential operator is often slow.
Modulox Mod 16x And 15You can rewrite x Mod y as x And (y-1) when x≥0 and y is a power of 2. What happens when x<0 is left as an exercise for the reader.
Divisionx / yx \ yInteger division can run faster than floating point division, depending on the arguments.

Note: You may need to write x*x*x*x as temp=x*x : result=temp*temp. It depends on how well your compiler optimizes it out.

Choose faster data types

Picking the right data type can make a huge difference. One might assume, for example, that a smaller data type (such as Byte) is faster than a larger data type (such as 32-bit Long integer). Well, this isn't the case. In VB6, Long is the fastest integer data type. It's also the native data type of the processor. Even if the value fits in a byte, put it in a Long if maximum speed is what you are after. (This advice doesn't necessarily work for large arrays, where there is the tradeoff between memory and speed.)

If you can, use integer data instead of floating point. Unnecessary use of floating point values makes a big performance hit.

With data types, there is a classic tradeoff between memory usage and speed. The article Save memory discusses variable data types and their memory consumption.

Optimize the control flow

Control flow optimization is an important part of giving your programs a performance boost. Sometimes you need to twist your code so that it's harder to read but runs faster.

If..Else..Then

The order of the If..Then..Else branches is not insignificant. The usual way to code is to make the code look natural to humans. This isn't always the fastest way. Did you know the Then block runs faster than the Else block (in VB6)? It makes a difference if the other block executes more often than the other. Consider this code:

If condition Then
   rare_block
Else
   usual_block
End If

Let's assume you have noticed that the program usually jumps to the usual_block. The rare_block executes only a few times. Unfortunately, jumping down to the usual_block takes too much time. Rewrite this as:

If Not condition Then
   usual_block
Else
   rare_block
End If

Now the usual_block is on top and you just saved someone from getting fed up with your slow program.

The same idea works also for Select Case. Put the most frequently used Case block at the top. Again, this rule works so for VB6. Check it out with your language before optimizing.

Short-circuit conditionals

Short-circuiting speeds up complex conditional statements. You can use it for conditionals with AND/OR operators.

In languages where operators AND/OR are not short-circuited, executing an expression such as "x And y" and "x Or y" evaluates both x and y before deciding what the outcome is. If the operators are short-circuited, y is not necessarily evaluated at all. The following table describes this in detail.

Evaluation of operators AND/OR
Op. Not short-circuited Short-circuited Flow chart
x And y
  • 1. Evaluate both x and y.
  • 2. If both are True, return True.
  • 3. Otherwise return False.
  • 1. Evaluate x.
  • 2. If x is False, return False.
  • 3. Otherwise evaluate y.
  • 4. If y is False, return False, otherwise return True.
Flow chart of short-circuited AND operation
x Or y
  • 1. Evaluate both x and y.
  • 2. If both are False, return False.
  • 3. Otherwise return True.
  • 1. Evaluate x.
  • 2. If x is True, return True.
  • 3. Otherwise evaluate y.
  • 4. If y is False, return False, otherwise return True.
Flow chart of short-circuited OR operation

For optimization purposes the benefit of short-circuiting is that you sometimes get away without evaluating the latter operand (y). If y happens to be a time-intensive operation, such as a function call, your program executes faster.

The potential pitfall with short-circuiting is that y is not always evaluated (function not called). This could introduce a bug that is hard to notice.

Languages differ in whether their operators are short-circuited or not. In Visual Basic 6.0 there is no short-circuiting. In C, on the other hand, the operators && and || are always short-circuited. In VB.NET, you have a choice. And is not short-circuited, AndAlso is. Or is not short-circuited, OrElse is.

In languages that don't have short-circuited operators, you can simulate them with the following constructs:

Not short-circuited Short-circuited
AND
If x And y Then DoIt
If x Then
   If y Then
      DoIt
OR
If x Or y Then DoIt
If x Then 
   DoIt
ElseIf y Then
   DoIt
End If

Inline function calls

Repeated function calls add overhead. Each time you call a function, you spend a little time for things like passing parameters and saving the function return value.

Inlining means you copy & paste the called function into the caller. [Well, a simple copy & paste is not always enough, you need to take the parameters and local variables into account, too.] It works best if the inlined function is small and not very complex. If the function to inline is a complex one, the amount of rework (refactoring) may be considerable.

Inlining is easier with languages that support macros or special inlined functions. Macros and inlined functions are automatically inlined for you at compile-time, so you don't need to do it manually.

Remove recursion

Recursion is a nice technique where a function calls itself repeatedly until the work is done. It's often the most elegant way to code some algorithm.

Unfortunately, recursion also adds the overhead of a function call. If the recursion tree gets deep, you spend the time with the calls instead of the real computation.

You remove recursion by replacing it with a loop. Sometimes it is enough to use GoTo to jump to the start of the function. Sometimes you need to do more, like arrange parameter values and local variables.

Visual Basic 6.0 optimization tricks

The following lists a number of VB6 specific optimization tricks. Other than VB6 coders can skip them.

VB6 file access tricks

There is no need to read/write files byte by byte. You can read/write large blocks at a time.

Read byte array from file

' Allocate space for count bytes
ReDim array(1 to count) As Byte
' Read count bytes from file
Get #filenum,, array()

Read entire file to byte array

' Allocate space for entire file
ReDim array(1 to LOF(filenum)) As Byte
' Read entire file
Get #filenum, 1, array()

Save byte array to file

Put #filenum,, array()

VB6 CopyMemory

Copying large data blocks in VB6 is slow. You can use a Windows system call to do that. CopyMemory copies a given number of bytes from one memory location to the other.

Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" _
    (Destination As Any, Source As Any, ByVal Length As Long)

CopyMemory is a bit tricky to get right the first time. Here's how you use the function.

CopyMemory Destination, Source, numbytes

CopyMemory: Copy an array

' Source data is numbytes in size
ReDim Source(0 to numbytes-1)
' Allocate memory for destination array
ReDim Destination(0 to numbytes-1)

CopyMemory Destination(0), Source(0), numbytes

It is important that there are at least numbytes bytes of data in both Destination() and Source(). Otherwise your program will crash. The equivalent VB code for array copy is:

Destination() = Source()

This the safer way to copy an array, but it's also the slower way.

CopyMemory: Assign a byte array to UDT

You can copy a byte array over a user-defined type (UDT) variable. This way you can access the various fields of the UDT directly without having to interpret the data from the raw bytes. Here is how you do it:

' Target variable
Dim UDT as MyUDT
' Source data in byte array, size LenB(UDT) bytes
ReDim Source(0 to LenB(UDT)-1)
' Copy source to target
CopyMemory UDT, Source(0), LenB(UDT)

In plain English, you copy exactly the UDT size bytes from the byte array over the UDT. The call LenB(UDT) returns the number of bytes required. The data will automatically arrange itself into the various fields of the UDT. — Note how you need to call LenB instead of Len. Keep in mind that Len ≤ LenB. UDTs can contain extra alignment bytes or Unicode Strings, which occupy 2 bytes per character. Len doesn't count the extra bytes. The effect is that you get an incomplete copy with the end bytes not being copied at all.

You can also do it vice versa. Once you have populated the UDT fields, copy it to a byte array.

CopyMemory and type conversions

You can do many other unorthodox type conversions with CopyMemory. You can, for example, copy a 2-dimensional array over a 1-dimensional array. This way you can convert a 2D (x,y) image into a simple list of bytes for further processing.

VB6 ReDim Preserve: Use sparingly

When you want to reallocate more (or less) space for an array, you use ReDim Preserve. Unfortunately, this is a slow command. The problem is worse if you call ReDim Preserve repeatedly during processing.

You are better off allocating enough space first. When the processing is done, deallocate the extra by a single ReDim Preserve.

Read more: Allocate arrays wisely

VB6 With..End With block

You may have heard that the With..End With block saves typing and also time. Did you know it's a slow statement? You don't want to use it just to access a single field. Consider the With block below:

With array(x)
   If .a = 0 Then
      .a = 1
      .b = 2 ...

If the condition .a=0 is rarely met, the With statement may take a considerable amount of extra time. It may run faster rewritten as:

If array(x).a = 0 Then
   With array(x)
      .a = 1
      .b = 2 ...

VB6 Pass initial values as parameters

You can save time by passing initial variable values as parameters. Consider the following example procedure, which uses a local variable x with an initial value of 2:

Sub Test()
   Dim x As Long
   x = 2
   ...
End Sub

This doesn't look too bad, really. But wait... you can make it faster! You can pass the initial value as a parameter:

Sub Test(ByVal x as Long)
   ...
End Sub

' Caller:
Test 2

The last line calls Test giving x=2 as the initial value, the way we wanted. As strange as it may look, this actually runs faster. (By the way, make it ByVal, not ByRef, unless you're passing a String.)

The trick works because VB6 doesn't have a variable initializer clause. Integer variables are initialized to zero. When you want a non-zero initial value, you need to execute an assignment statement (x=2). When you pass it as a parameter, there is no assignment statement – and your program runs faster!

This is nowhere near a good programming practice. After all, callers must pass 2 as the parameter, otherwise the program will fail. On the other hand, end-users will surely value faster execution over nice coding. Just document it well.

VB6 Use zero initial values

As an alternative to the above trick, you can pick local variables so that their initial values are zero. Instead of a 1-based array index use a 0-based index, for example.

VB6 variables and arrays

Static variables are soooo static

Avoid the use of Static local variables. They are far slower than regular Dim variables. Avoid the use of the Static keyword altogether. Use module-level variables as a replacement.

Module-level or local variables?

Oddly enough, the speed difference between accessing a module-level variable (class variable) vs. a local variable is not constant. Sometimes it's faster to read/write a local, sometimes a module-level variable. Experiment.

You can replace a module-level variable with a local copy. Use a local temporary variable and write the changes back at the end of the procedure. If this makes sense, that is.

Dynamic array is faster than fixed-size array

Dynamic arrays are faster to access than static arrays. Consider the following alternatives:

Static arrayDynamic array
Fixed size array.Array size can change during execution.
Dim a(10000) As Byte Dim a() As Byte
ReDim a(10000)

It is faster to access the array you allocated with Dim+ReDim. Note that you do need the Dim before ReDim! ReDim alone is not enough. Use Dim to define the data type and ReDim to define the size. ReDim cannot change the data type of an array, so there is no need to put the data type in the ReDim statement. Putting it there will do no harm, though.

(End of Visual Basic 6.0 specific tricks)

General considerations

Don't process the data unless you have to. This sounds obvious, but sometimes you see code that runs in vain. Consider a function that arranges an array of bytes. If the bytes are already in the correct arrangement, there is no reason to rearrange. An example is a sort routine that sorts data that was already sorted, or an imaging application that optimizes the image palette even if it was optimal already.

Process as little data as you can. Consider an imaging application that compresses image bytes. The fewer bytes there are to compress, the faster it runs. To keep the amount of data down, use the lowest color depth (bits per pixel) possible. The image can be represented with fewer bytes and the compression runs faster.

Pre-process the data. If your code spends its time doing lookups on data, store the data so that the lookups are faster. As an example, you can sort the data and do the lookups in the sorted data, which is generally much faster than searching in unsorted data.

Use available system functions. They often run faster than what you can possibly code. Learn the various functions of your programming language, framework and the operating system. As an example, searching for a character in a string in VB6 is faster done with the built-in InStr function than reading through the string character by character.

Optimization requires documentation

Code optimization can make the program look like spaghetti. The more you optimize, the harder the code gets to read. When you replace recursion by a GoTo, inline function calls, rearrange If..Then..Else blocks, replace clear formulas with mysterious optimized versions, the code gets more and more complex for the reader. This makes it a pain to maintain.

Therefore, remember to document your work. Even if the original code was clear enough to go without comments, the optimized version probably needs them badly. Comment out the old code and write an explanation what was changed and why. If the optimized code is too hard to understand, the original lines can make it clear how it used to work. Explain why you chose And instead of Mod and what the reason was.

Keep in mind that optimizing a piece of code could change the requirements of calling it. As an example, replacing x Mod y with x And (y-1) requires that x ≥ 0 and y is a power of 2. Similarly, replacing floating point division with integer division could make a difference with some special input values (due to rounding). Moving the loop condition to end-of-loop absolutely requires the loop must execute at least once. Document the changed requirements as you optimize. It will make life much easier later on.

Finding the bottlenecks

Most of the performance gains are often due to a limited number of changes in certain key locations in your code. What are these locations and how can you find them?

A critical location is executed thousands or hundreds of thousands of times. It may be inside a loop or a recursive algorithm. The location may consist of a handful of procedures or certain lines of code inside them.

It's not always possible to tell a bottleneck by just looking at the code searching for loops or recursion. A code profiler, such as VB Watch, is useful for finding the critical locations. It logs the execution times as you run your program. When ready, it tells you which procedures or lines were executed the highest number of times, and which ones took the longest time to execute. These places are the best candidates for manual optimization work.

Optimize loops
URN:NBN:fi-fe20061459