Optimize string handling in VB6 - Part I
String handling in Visual Basic is slow if done the wrong way. You can add significant performance to string operations by following some easy rules.
Faster strings with VB6
Visual Basic 6.0 offers a large selection of useful string handling
functions such as
You can overcome many of the speed limitations by clever coding. This article shows a number of good tricks to add speed to string-intensive applications. The tricks use pure VB6 code. No extra run-time files or API calls are necessary.
In this article:
VB6 functions and operators in this article:
PA This sign appears where Project Analyzer detects unoptimized coding. Project Analyzer is a VB code analysis tool that finds unoptimal functions and replaces them with better ones.
Who should read this article?
These tips are based on Visual Basic 6.0 and variable-length strings. They're most useful with string-intensive programs that read, parse or manipulate large amounts of text. The performance gains from using these techniques are significant if you're executing the calls thousands or hundreds of thousands of times. If you're just occasionally writing and reading a few strings outside of loops, these tips won't help you much. While the tips work best for VB6, some of them are generic in that they also apply to earlier and later versions of VB.
Why are VB6 strings so slow?
Perhaps the biggest bottleneck is that VB makes copies of the string data when doing some of the operations. Even when you're just reading strings (and not planning to make any modifications), you can easily end up making a large number of copies. The copying costs you time if string processing is an intensive part of your program. Another reason is that some of the widely used functions are implemented in a non-straightforward way. They may be doing more work than what is required for your task. Fortunately, you can often replace an advanced functions with a simpler and faster alternative.
Optimize the empty string
Checking for empty string
PA It's often necessary to test for an empty string. The usual ways are these:
However, VB executes the following equivalent statements much faster.
The replacement is essentially risk-free. Your code executes the same as before, only faster.
VB's implementation of
Note that we use the
Assigning an empty string to a variable
PA This is the usual way to clear a string variable.
What a waste! First of all, the string
So what is this?
For most purposes,
If you call some non-VB API or component, test the calls with
vbNullString before distributing your application. The function you're
calling might not check for a
No variants please
It's a simple thing but often overlooked. All variables, parameters and functions should have a defined data type. If the data is a string, then the data type should be defined as string. If you don't give a data type, you're using a
PA So add those
Dollars that make your program run faster
PA The following functions are unoptimal if you're using them on strings:
These are the dreaded
So what's all that
Dollar variables are no good
How about the dollar sign with variables? In the names of your own functions? Does it help?
No. The dollar sign only helps with the above VB functions. In this
article we've also used the $ sign to denote a string variable such as
PA We don't recommend the $ sign for string
variables. In real code, you should define your variables (and
functions) with a real datatype, such as this:
|13 & 10|
|13 & 10|
For some reason,
vbNewline is a little bit faster than
The last example (
"") is not actually a constant but an escape
sequence. You can use "" anywhere in a string to represent a quotation
mark. The alternative is
Chr(34), which was required in some early BASIC
versions where the
"" syntax didn't exist.
You can also define other other character values to
avoid repeated calls to
ChrW$(). If the character value is in the
ASCII range 0–31, you need to define them as variables and assign the
correct character value before use.
Dim BEL As String BEL = ChrW$(7) ' The BEL character, or ^G
For other characters you can simply use a constant.
Const Percentage = "%"
PA It's obvious, but calling
AscW on a string constant makes no sense. The value returned is a constant. It never changes. Instead of
Asc("A"), use the value 65. Better yet, define a constant such as:
Const ascA = 65
Use the constant instead of 65 for more legibility. As it happens, VB.NET compiles
Asc("A") better, but since we're in VB6, we need to define this constant.
If the same string exists in more than one location in your project, it will also exist in several locations in the executable file, as far as VB6 is concerned (VB.NET joins duplicated strings during compilation).
You can optimize by defining your strings as constants and referencing the constant where you need the string value. This way you save space as each constant gets stored only once. Besides, if you ever consider localizing your program, you have a useful list of string constants to give to the translator.
There is a nasty exception. It doesn't save any space to define constants by other constants.
Const MSG1 = "Hello, " Const MSG2 = "world!" Const MSG3 = MSG1 & MSG2
In this case you will actually the same text twice in the executable.
MSG3 will get stored –
not something you wanted! If you want to save space, concatenate
MSG1 & MSG2 at
run-time. For speed, store it in a variable for reuse.
Also notice that the above applies to string constants only. Numeric constants are also computed and stored in the executable, but string constants are more likely to demand more space: 6 bytes overhead + 2 bytes per character.
PA String literal analysis is a Project Analyzer feature that reports duplicate strings. Follow the link to read more about the elimination of unnecessary string literals.
When compiling to an executable file, VB stores (most) string literals in Unicode, requiring 2 bytes per character. If you want to store your strings 1 byte per character, use resource files instead. This might reduce your executable size considerably if the amount of string data is large.
Note that you need to store the strings as a "custom resource" (binary format), not in the regular resource string table (Unicode). Press the Add Custom Resource button in the VB6 Resource Editor to add a text file as a custom resource.
Resource files are also handy for storing very long strings, multiline strings and strings that may be subject to localization.
Bug alert: If you store strings as a custom resource, make sure the strings consist of plain ASCII characters (0–127). Alternatively, make sure all the users use the same codepage as you. Otherwise the text may look different in a different locale. As an example, instead of the letter Ä a Greek user can see the letter Δ. The default way of storing strings as Unicode avoids this problem.
Comparing strings against each other may take longer than you expected. Here are a few tricks.
Here are two unoptimized ways to branch on the first character in a string.
' Case 1 If Left$(Text$, 1) = "A" Then ' Case 2 Select Case Left$(Text$, 1) Case "A" Case "B" End Select
Rather than calling
Left$(), we can call
AscW() to determine the first letter of a string. The following examples are faster:
' Case 1 If LenB(Text$) <> 0 Then If AscW(Text$) = 65 Then ' AscW("A")=65 ' Case 2 If LenB(Text$) <> 0 Then Select Case AscW(Text$) Case 65 ' A Case 66 ' B End Select
AscW() is faster than first calling
Left$(), then comparing
the result to another string. There's a caveat, however.
AscW() on an
empty or null string is a run-time error. That's why you must
first test with
LenB() to rule out that possibility. You can leave out
the call to
LenB() only if you're certain that the string contains at
least one character.
Select Case structure offers an additional bonus. Having single numbers in the
Case conditions is less time-intensive than repeatedly comparing against a string.
Similar to the above trick, this is the way to check for a character in the middle of a string.
If AscW(Mid$(Text$, index, 1)) = 65 Then
Note that index must be less than or equal to
Len(Text$). Otherwise you get a run-time error.
Mid$ may return a long string, the third parameter to
Mid$(, , 1) is essential for optimization. Without the
Mid$ can spend a lot of time making an unnecessarily long
Text$. Part III of this article
goes deeper into this issue.
PA Whenever you can, use binary comparison. This is VB's default. Text comparison is much slower. These statements slow your application down:
Option Compare Text StrComp(, , vbTextCompare) InStr(, , , vbTextCompare)
If you need a case-insensitive
LCase$ to do it,
especially if it's enough on one parameter only:
' Slower StrComp(Text1$, "abc", vbTextCompare) ' Faster StrComp(LCase$(Text1$), "abc", vbBinaryCompare)
In the following case, the two calls to
LCase$ remove the performance gain you got above:
StrComp(LCase$(Text1$), LCase$(Text2$), vbBinaryCompare)
Bear in mind that
StrComp(,,vbTextCompare) is more than just a
case-insensitive comparison. It's actually built for sorting, not
comparing for equality. In many cases, such a locale-dependent textual
comparison is an overkill and can even lead to subtle errors. More about
InStr is a nice function to find a string inside another one.
Normally you use the plain InStr function, the wide-character version. More about
There is an optimization with the byte version,
InStrB. If you are just
going to check whether a string exists inside the other but don't care
about the location, you can use the following code:
If InStrB(Text$, SearchFor$) <> 0 Then
Since you only compare the return value against zero, you don't need to worry about conversions between byte-based indices and character indices. This is not the whole story, however. You need to be aware of the following catches:
InStrBworks completely on byte-based index values. The return value, as well as the start index parameter (the first numeric parameter, not present in the above call) are both in bytes, not in characters. One character is 2 bytes. Use the equation
byteindex = (characterindex * 2) - 1to convert indices. If there is a match at character 3, the byte index is 5.
InStrBis a byte data function and it's dangerous to use it on character input. If the strings may contain character values outside the range 1–255, be careful. Chances are
InStrBis not good for you. As
InStrBdoes a byte-wise search, it can return matches between characters:
In this case,
' Bytes 34 12 78 56 hex Text$ = ChrW$(&H1234) & ChrW$(&H5678) ' Bytes 12 78 SearchFor$ = ChrW$(&H7812)
InStrBreturns 2, which is the start of the byte sequence 12 78 but doesn't match any of the input characters. This is probably not what you want when working with strings. Note that even if your strings are plain ASCII, the null character can still pose a problem. Example:
InStrB("A" & vbNullChar, vbNullChar)returns 2, not 3 as one might expect.
vbBinaryCompareis simpler to understand.
What does this mean? Use
InStrB to optimize only when you fully understand how it works.
Like operator is not particularly fast. Consider alternatives. We don't have a generic rule to follow here. You need to measure the performance differences between your alternatives. Here is one rule though. It applies if you're looking for a certain string inside another one.
If Text$ Like "*abc*" Then
If InStr(Text$, "abc") <> 0 Then
You may also use
InStrB if you know what you're doing.
Procedure string parameters differ from numeric parameters in that with strings, the chosen parameter passing convention makes a real performance difference.
How should you define procedure parameters for calls from within the same project?
ByVal is slow for string parameters.
ByVal makes a copy of the string on every call. The good side is that a
ByVal parameter is safe to modify: the modifications aren't passed back to the callers.
ByRef is faster because the string doesn't get copied. The drawback is that you have to be careful. If your intention is not to return a value in the
ByRef parameter back to the caller, you may not accidentally write to this parameter.
ByRefinstead of function return value
There's also an optimization trick for returning a string value. Returning a string as the function return value is the normal practice. However, returning a string in a
ByRef parameter is faster.
ByRef trick for return values applies to both functions and
Property Get's. Here's the usual (and slower) way:
' Slow: Property Get Name() As String Name = m_sName End Property ' Slow: Function Name() As String Name = m_sName End Function
This way is faster if you have to make a large number of calls:
' Fast: Sub GetName(ByRef Name_out As String) Name_out = m_sName End Sub
It's often considered bad programming style to return values in parameters. Normally procedures should not cause side-effects by modifying their
ByRef parameters. However, if you want speed, you sometimes have to reject accepted programming practices to win a few CPU cycles. Thus, the optimization objective might justify the loss of style. You can use
ByRef, but you should indicate why you're using it. For example, you can mark all output parameters with the word out, or write a comment saying
ByRef is used for speed.
There is one case where
ByRef is slower than
ByVal. This happens when passing
ByRef to an
out-of-process server. The variable has to be marshalled twice, once going into the method and once returning. The implication is to use
ByVal for your public server interfaces.
Most of the performance gains by string optimization may actually be due to a limited number of changes in certain key locations in your code. What are these locations and how can you find them?
A critical location is executed thousands or hundreds of thousands of times. It may be inside a loop or a recursive algorithm. The location may consist of a handful of procedures or certain lines of code inside them.
It's not always possible to tell a bottleneck by just looking at the code searching for loops or recursion. A code profiler, such as VB Watch, is useful for finding the critical locations. It logs the execution times as you run your program. When ready, it tells you which procedures or lines were executed the highest number of times, and which ones took the longest time to execute. These places are the best candidates for manual optimization work. For other parts of the code you can rely on a more automatic optimization method, such as letting Project Analyzer run auto-fix on your code by routinely replacing ineffective calls with better ones.
Part II introduces you to fast and slow VB functions, Unicode API calls and robust handling of huge strings.
P.S. Flowchart your code? Try Visustin.
Optimize string handling in Visual Basic 6.0