Optimize string handling in VB6 - Part III

Processing strings with Visual Basic 6.0 can get considerably faster if you know the tricks. Part III of this article studies the performance of Left$, Mid$ and Right$ in detail. We learn to quickly examine individual characters with Asc and AscW. We check out the differences between Asc/AscW and Chr/ChrW. We also see how a badly placed pair of extra parentheses can degrade performance.

Part I | Part II | Part III

Part III of this article goes deep in the specifics of string processing in Visual Basic 6. Read the previous parts to get an understanding of the basics of optimization and how strings work in VB6.

VB6 functions in this article: Asc, AscW, Chr$, ChrW$, Left$, Mid$, Right$.

Left$, Mid$ and Right$

Functions Left$, Mid$ and Right$ are essential in string processing. If you've read Part I of this article, you already know that the string versions ($) of these functions run faster than the variant versions (without $). Since these functions are so important, let's take a deeper look into the various functions, their parameters and what exactly it is in these functions that may slow down your apps.

These functions return a partial copy of the input string (s). The input gets copied to the output in whole or in part. Usually in part.

Left$(s, n)
Mid$(s, x)
Mid$(s, x, n)
Right$(s, n)

Left$ returns n characters from the start of s.
Mid$ with 2 parameters returns the rest of s starting with position x.
Mid$ with 3 parameters returns n characters of s starting with position x.
Right$ returns n characters from the end of the s.
Note: Actually the functions return n characters or less. If the string is too short, the output will be shorter than n.

In this article we will use the following variables:
s = input string
n = length of output string (number of characters)
x = start position for Mid$

Avoid a useless full copy

Quite obviously, if Left$, Mid$ or Right$ return a full copy of the whole of s, there is no point calling them at all. You can as well use s. The following calls are useless:

Left$(s, n)   when n>=Len(s)
Mid$(s, 1)    always
Mid$(s, 1, n) when n>=Len(s)
Right$(s, n)  when n>=Len(s)

If there is a risk for such calls, test for n<Len(s) before calling these functions.

Performance of Left$, Mid$ and Right$

Now, what are the essential factors affecting the performance of Left$, Mid$ and Right$? Is it the size of input (s), the size of output (n), the position of output (x) or the choice of the function (Left$, Mid$ or Right$)?

To test this, we created a small VB6 program that executed each of Left$, Mid$ and Right$ 5 million times. As input s we used strings of 1, 10, 100, 1000 and 10000 characters. As output sizes, we used n = 1, 10, 100, 1000 and 10000, but no longer than length of s. In addition, we tested whether there is any difference in Mid$ from start-of-string compared to end-of-string.

Performance test results: Left$, Mid$ and Right$

  1. Output size (n) dictates speed. The longer the returned string, the slower the functions run.
  2. High n equals slow performance.
  3. Input size (length of s) has no effect.
  4. Middle parameter to Mid$(.., x, ..) has no significant effect when output size is the same.
  5. Left$ and Right$ run faster than Mid$. The difference is only significant with small output sizes (1, 10 and 100). With large output (1000 and 10000) the speed difference is negligible.
  6. Mid$ with 2 parameters is marginally faster (a few percent) than Mid$ with 3 parameters if output is the same.
  7. Left$, Mid$ and Right$ run in O(n) time, where n is the number of characters returned.

In a summary, Left$, Mid$ and Right$ spend their time making a (partial) copy of the input string. Copying is the performance bottleneck. Now, how can we take advantage of these findings?

Guidelines for Left$, Mid$ and Right$

Left$(s, n)       Mid$(s, x)       Mid$(s, x, n)       Right$(s, n)
  • Don't copy too many characters. Use as low n as possible.
  • Use Left$ and Right$ where you would intuitively use them. Don't use Mid$ instead.
  • Replace Mid$(s, 1, n) with Left$(s, n).
  • Replace Mid$(s, x) with Mid$(s, x, n) if truncated output is OK. This limits the output size to a reasonable n. Note that this optimization is impossible when you need the end of a string. See next.
  • To retrieve the end of a string, Right$(s, n) is fastest, then Mid$(s, x), then Mid$(s, x, n). Note that since the functions take different parameters, accurately computing the parameters may pose an intellectual challenge.
  • Where Mid$(s, x, n) returns the end of a string, replace it with Right$(s, n). This change is risky, since the calls are not exactly the same. Mid$ may return less than n characters depending on x. Make sure you don't add a bug with this optimization.

Asc and AscW nested with Left$, Mid$ and Right$

The functions Asc and AscW are frequently used together with Left$, Mid$ and Right$ to tell what a specific character is. Asc and AscW are indeed good fast functions for this purpose. (AscW is actually faster, but we'll go into that a bit later)

In the following, what is said about AscW also applies to Asc.

AscW(Left$(..)) is useless

Don't nest AscW and Left$. It makes no sense. AscW(Left$(s, n)) is equivalent to AscW(s). Since AscW only looks at the first character of string s, the call to Left$ just slows down your program without doing anything useful.

AscW(Mid$(..)) considerations

Don't copy too many characters with Mid$. AscW only examines the first character anyway. AscW(Mid$(s, x, 1)) is the best call. Note that if you call AscW(Mid$(s, x)) without the third parameter, Mid$ executes slowly when s is a long string.

Example of potentially slow code:

For x = 1 To Len(s)
   If AscW(Mid$(s, x)) = ... Then ...
Next

The above is better written as:

For x = 1 To Len(s)
   If AscW(Mid$(s, x, 1)) = ... Then ...
Next

The performance difference becomes apparent when s is a fairly long string. If you don't test your program with long inputs, you might not notice the performance problem. Users with long inputs will notice it.

AscW(Right$(..)) considerations

When calling AscW(Right$(s, n)) we need to consider n, the number of characters returned by Right$. Performance problems don't exist when n is small. When s is a long string and n can get large, Right$ should not be used. This call is less than optimal:

AscW(Right$(s, n))

Replace Right$ with Mid$ that returns one character only. Here:

AscW(Mid$(s, Len(s) - n + 1, 1))

Both calls examine the nth character from the end of s. The first call copies n characters, while the second call copies just one character.

Use Unicode: AscW and ChrW$

VB6 works internally with Unicode. Every string is in Unicode, which takes 2 bytes per character. Unicode makes a developer's life simpler. Unfortunately it doesn't make a VB6 developer's life any simpler! While VB6 uses Unicode for strings, it still uses Ansi for input, output and forms.

Because of historical reasons, many VB developers stick to the good old Asc() and Chr$() unless they intend to write international applications. You don't have to be writing international applications to take advantage of a couple of Unicode optimizations. If you're concerned about speed, use the "wide" Unicode versions of these functions: AscW() and ChrW$().

  • AscW() is not the same as Asc(). They can return different values for the same character.
  • ChrW$() is not equal to Chr$() either. They take different parameter values. Alternatively, they can return a different character for the same input value.

The good news for string optimizers is that for characters in the plain old ASCII range (0-127), AscW equals Asc and ChrW$ equals Chr$. Very good! So go ahead and replace Asc with AscW and Chr with ChrW, as long as you keep in the 0-127 range.

Suggestion: Run Project Analyzer on your code to detect the slower Asc and Chr versions for replacement.

Asc/AscW and Chr/ChrW tables

What if you need to work outside the range 0-127? Differences (bugs) will show up if you mix Asc/AscW or Chr/ChrW outside that range.

The differences are easiest to understand in the form of codepage tables. The following tables show what Asc and AscW return for characters beyond 0-127. If you are not familiar with codepages, English-speaking users and those in Western Europe and Americas will usually use codepage 1252 (Latin I). After the tables we are going to discuss how to convert Asc to AscW and Chr to ChrW, and the problems there are to expect. (skip tables)

char = Character
Asc = ANSI value of char, specific to codepage
AscW = Unicode value of char, independent of codepage
Example using first table: Asc("€") = 128, AscW("€") = 8364, Chr$(128) = ChrW$(8364) = "€"

Asc/AscW values in codepage 1250 ANSI Central European
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 °176176 Ŕ192340 Đ208272 ŕ224341 đ240273
129129 1458216 ˇ161711 ±177177 Á193193 Ń209323 á225225 ń241324
1308218 1468217 ˘162728 ˛178731 Â194194 Ň210327 â226226 ň242328
ƒ131131 1478220 Ł163321 ł179322 Ă195258 Ó211211 ă227259 ó243243
1328222 1488221 ¤164164 ´180180 Ä196196 Ô212212 ä228228 ô244244
1338230 1498226 Ą165260 µ181181 Ĺ197313 Ő213336 ĺ229314 ő245337
1348224 1508211 ¦166166 182182 Ć198262 Ö214214 ć230263 ö246246
1358225 1518212 §167167 ·183183 Ç199199 ×215215 ç231231 ÷247247
ˆ136136 ˜152152 ¨168168 ¸184184 Č200268 Ř216344 č232269 ř248345
1378240 1538482 ©169169 ą185261 É201201 Ů217366 é233233 ů249367
Š138352 š154353 Ş170350 ş186351 Ę202280 Ú218218 ę234281 ú250250
1398249 1558250 «171171 »187187 Ë203203 Ű219368 ë235235 ű251369
Ś140346 ś156347 ¬172172 Ľ188317 Ě204282 Ü220220 ě236283 ü252252
Ť141356 ť157357 ­173173 ˝189733 Í205205 Ý221221 í237237 ý253253
Ž142381 ž158382 ®174174 ľ190318 Î206206 Ţ222354 î238238 ţ254355
Ź143377 ź159378 Ż175379 ż191380 Ď207270 ß223223 ď239271 ˙255729
Asc/AscW values in codepage 1251 ANSI Cyrillic
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
Ђ1281026 ђ1441106  160160 °176176 А1921040 Р2081056 а2241072 р2401088
Ѓ1291027 1458216 Ў1611038 ±177177 Б1931041 С2091057 б2251073 с2411089
1308218 1468217 ў1621118 І1781030 В1941042 Т2101058 в2261074 т2421090
ѓ1311107 1478220 Ј1631032 і1791110 Г1951043 У2111059 г2271075 у2431091
1328222 1488221 ¤164164 ґ1801169 Д1961044 Ф2121060 д2281076 ф2441092
1338230 1498226 Ґ1651168 µ181181 Е1971045 Х2131061 е2291077 х2451093
1348224 1508211 ¦166166 182182 Ж1981046 Ц2141062 ж2301078 ц2461094
1358225 1518212 §167167 ·183183 З1991047 Ч2151063 з2311079 ч2471095
1368364 ˜152152 Ё1681025 ё1841105 И2001048 Ш2161064 и2321080 ш2481096
1378240 1538482 ©169169 1858470 Й2011049 Щ2171065 й2331081 щ2491097
Љ1381033 љ1541113 Є1701028 є1861108 К2021050 Ъ2181066 к2341082 ъ2501098
1398249 1558250 «171171 »187187 Л2031051 Ы2191067 л2351083 ы2511099
Њ1401034 њ1561114 ¬172172 ј1881112 М2041052 Ь2201068 м2361084 ь2521100
Ќ1411036 ќ1571116 ­173173 Ѕ1891029 Н2051053 Э2211069 н2371085 э2531101
Ћ1421035 ћ1581115 ®174174 ѕ1901109 О2061054 Ю2221070 о2381086 ю2541102
Џ1431039 џ1591119 Ї1751031 ї1911111 П2071055 Я2231071 п2391087 я2551103
Asc/AscW values in codepage 1252 ANSI Latin I
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 °176176 À192192 Ð208208 à224224 ð240240
129129 1458216 ¡161161 ±177177 Á193193 Ñ209209 á225225 ñ241241
1308218 1468217 ¢162162 ²178178 Â194194 Ò210210 â226226 ò242242
ƒ131402 1478220 £163163 ³179179 Ã195195 Ó211211 ã227227 ó243243
1328222 1488221 ¤164164 ´180180 Ä196196 Ô212212 ä228228 ô244244
1338230 1498226 ¥165165 µ181181 Å197197 Õ213213 å229229 õ245245
1348224 1508211 ¦166166 182182 Æ198198 Ö214214 æ230230 ö246246
1358225 1518212 §167167 ·183183 Ç199199 ×215215 ç231231 ÷247247
ˆ136710 ˜152732 ¨168168 ¸184184 È200200 Ø216216 è232232 ø248248
1378240 1538482 ©169169 ¹185185 É201201 Ù217217 é233233 ù249249
Š138352 š154353 ª170170 º186186 Ê202202 Ú218218 ê234234 ú250250
1398249 1558250 «171171 »187187 Ë203203 Û219219 ë235235 û251251
Œ140338 œ156339 ¬172172 ¼188188 Ì204204 Ü220220 ì236236 ü252252
141141 157157 ­173173 ½189189 Í205205 Ý221221 í237237 ý253253
Ž142381 ž158382 ®174174 ¾190190 Î206206 Þ222222 î238238 þ254254
143143 Ÿ159376 ¯175175 ¿191191 Ï207207 ß223223 ï239239 ÿ255255
Asc/AscW values in codepage 1253 ANSI Greek
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 °176176 ΐ192912 Π208928 ΰ224944 π240960
129129 1458216 ΅161901 ±177177 Α193913 Ρ209929 α225945 ρ241961
1308218 1468217 Ά162902 ²178178 Β194914 210-1798 β226946 ς242962
ƒ131402 1478220 £163163 ³179179 Γ195915 Σ211931 γ227947 σ243963
1328222 1488221 ¤164164 ΄180900 Δ196916 Τ212932 δ228948 τ244964
1338230 1498226 ¥165165 µ181181 Ε197917 Υ213933 ε229949 υ245965
1348224 1508211 ¦166166 182182 Ζ198918 Φ214934 ζ230950 φ246966
1358225 1518212 §167167 ·183183 Η199919 Χ215935 η231951 χ247967
ˆ136136 ˜152152 ¨168168 Έ184904 Θ200920 Ψ216936 θ232952 ψ248968
1378240 1538482 ©169169 Ή185905 Ι201921 Ω217937 ι233953 ω249969
Š138138 š154154 170-1799 Ί186906 Κ202922 Ϊ218938 κ234954 ϊ250970
1398249 1558250 «171171 »187187 Λ203923 Ϋ219939 λ235955 ϋ251971
Œ140140 œ156156 ¬172172 Ό188908 Μ204924 ά220940 μ236956 ό252972
141141 157157 ­173173 ½189189 Ν205925 έ221941 ν237957 ύ253973
Ž142142 ž158158 ®174174 Ύ190910 Ξ206926 ή222942 ξ238958 ώ254974
143143 Ÿ159159 1758213 Ώ191911 Ο207927 ί223943 ο239959 255-1797
Asc/AscW values in codepage 1254 ANSI Turkish
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 °176176 À192192 Ğ208286 à224224 ğ240287
129129 1458216 ¡161161 ±177177 Á193193 Ñ209209 á225225 ñ241241
1308218 1468217 ¢162162 ²178178 Â194194 Ò210210 â226226 ò242242
ƒ131402 1478220 £163163 ³179179 Ã195195 Ó211211 ã227227 ó243243
1328222 1488221 ¤164164 ´180180 Ä196196 Ô212212 ä228228 ô244244
1338230 1498226 ¥165165 µ181181 Å197197 Õ213213 å229229 õ245245
1348224 1508211 ¦166166 182182 Æ198198 Ö214214 æ230230 ö246246
1358225 1518212 §167167 ·183183 Ç199199 ×215215 ç231231 ÷247247
ˆ136710 ˜152732 ¨168168 ¸184184 È200200 Ø216216 è232232 ø248248
1378240 1538482 ©169169 ¹185185 É201201 Ù217217 é233233 ù249249
Š138352 š154353 ª170170 º186186 Ê202202 Ú218218 ê234234 ú250250
1398249 1558250 «171171 »187187 Ë203203 Û219219 ë235235 û251251
Œ140338 œ156339 ¬172172 ¼188188 Ì204204 Ü220220 ì236236 ü252252
141141 157157 ­173173 ½189189 Í205205 İ221304 í237237 ı253305
Ž142142 ž158158 ®174174 ¾190190 Î206206 Ş222350 î238238 ş254351
143143 Ÿ159376 ¯175175 ¿191191 Ï207207 ß223223 ï239239 ÿ255255
Asc/AscW values in codepage 1255 ANSI Hebrew
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 °176176 ְ1921456 ׀2081472 א2241488 נ2401504
129129 1458216 ¡161161 ±177177 ֱ1931457 ׁ2091473 ב2251489 ס2411505
1308218 1468217 ¢162162 ²178178 ֲ1941458 ׂ2101474 ג2261490 ע2421506
ƒ131402 1478220 £163163 ³179179 ֳ1951459 ׃2111475 ד2271491 ף2431507
1328222 1488221 1648362 ´180180 ִ1961460 װ2121520 ה2281492 פ2441508
1338230 1498226 ¥165165 µ181181 ֵ1971461 ױ2131521 ו2291493 ץ2451509
1348224 1508211 ¦166166 182182 ֶ1981462 ײ2141522 ז2301494 צ2461510
1358225 1518212 §167167 ·183183 ַ1991463 ׳2151523 ח2311495 ק2471511
ˆ136710 ˜152732 ¨168168 ¸184184 ָ2001464 ״2161524 ט2321496 ר2481512
1378240 1538482 ©169169 ¹185185 ֹ2011465 217-1907 י2331497 ש2491513
Š138138 š154154 ×170215 ÷186247 ֺ2021466 218-1906 ך2341498 ת2501514
1398249 1558250 «171171 »187187 ֻ2031467 219-1905 כ2351499 251-1900
Œ140140 œ156156 ¬172172 ¼188188 ּ2041468 220-1904 ל2361500 252-1899
141141 157157 ­173173 ½189189 ֽ2051469 221-1903 ם2371501 2538206
Ž142142 ž158158 ®174174 ¾190190 ־2061470 222-1902 מ2381502 2548207
143143 Ÿ159159 ¯175175 ¿191191 ֿ2071471 223-1901 ן2391503 255-1898
Asc/AscW values in codepage 1256 ANSI Arabic
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 گ1441711  160160 °176176 ہ1921729 ذ2081584 à224224 ً2401611
پ1291662 1458216 ،1611548 ±177177 ء1931569 ر2091585 ل2251604 ٌ2411612
1308218 1468217 ¢162162 ²178178 آ1941570 ز2101586 â226226 ٍ2421613
ƒ131402 1478220 £163163 ³179179 أ1951571 س2111587 م2271605 َ2431614
1328222 1488221 ¤164164 ´180180 ؤ1961572 ش2121588 ن2281606 ô244244
1338230 1498226 ¥165165 µ181181 إ1971573 ص2131589 ه2291607 ُ2451615
1348224 1508211 ¦166166 182182 ئ1981574 ض2141590 و2301608 ِ2461616
1358225 1518212 §167167 ·183183 ا1991575 ×215215 ç231231 ÷247247
ˆ136710 ک1521705 ¨168168 ¸184184 ب2001576 ط2161591 è232232 ّ2481617
1378240 1538482 ©169169 ¹185185 ة2011577 ظ2171592 é233233 ù249249
ٹ1381657 ڑ1541681 ھ1701726 ؛1861563 ت2021578 ع2181593 ê234234 ْ2501618
1398249 1558250 «171171 »187187 ث2031579 غ2191594 ë235235 û251251
Œ140338 œ156339 ¬172172 ¼188188 ج2041580 ـ2201600 ى2361609 ü252252
چ1411670 1578204 ­173173 ½189189 ح2051581 ف2211601 ي2371610 2538206
ژ1421688 1588205 ®174174 ¾190190 خ2061582 ق2221602 î238238 2548207
ڈ1431672 ں1591722 ¯175175 ؟1911567 د2071583 ك2231603 ï239239 ے2551746
Asc/AscW values in codepage 1257 ANSI Baltic
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 °176176 Ą192260 Š208352 ą224261 š240353
129129 1458216 161-1796 ±177177 Į193302 Ń209323 į225303 ń241324
1308218 1468217 ¢162162 ²178178 Ā194256 Ņ210325 ā226257 ņ242326
ƒ131131 1478220 £163163 ³179179 Ć195262 Ó211211 ć227263 ó243243
1328222 1488221 ¤164164 ´180180 Ä196196 Ō212332 ä228228 ō244333
1338230 1498226 165-1795 µ181181 Å197197 Õ213213 å229229 õ245245
1348224 1508211 ¦166166 182182 Ę198280 Ö214214 ę230281 ö246246
1358225 1518212 §167167 ·183183 Ē199274 ×215215 ē231275 ÷247247
ˆ136136 ˜152152 Ø168216 ø184248 Č200268 Ų216370 č232269 ų248371
1378240 1538482 ©169169 ¹185185 É201201 Ł217321 é233233 ł249322
Š138138 š154154 Ŗ170342 ŗ186343 Ź202377 Ś218346 ź234378 ś250347
1398249 1558250 «171171 »187187 Ė203278 Ū219362 ė235279 ū251363
Œ140140 œ156156 ¬172172 ¼188188 Ģ204290 Ü220220 ģ236291 ü252252
¨141168 ¯157175 ­173173 ½189189 Ķ205310 Ż221379 ķ237311 ż253380
ˇ142711 ˛158731 ®174174 ¾190190 Ī206298 Ž222381 ī238299 ž254382
¸143184 Ÿ159159 Æ175198 æ191230 Ļ207315 ß223223 ļ239316 ˙255729
Asc/AscW values in codepage 1258 ANSI/OEM - Vietnamese
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 °176176 À192192 Đ208272 à224224 đ240273
129129 1458216 ¡161161 ±177177 Á193193 Ñ209209 á225225 ñ241241
1308218 1468217 ¢162162 ²178178 Â194194 ̉210777 â226226 ̣242803
ƒ131402 1478220 £163163 ³179179 Ă195258 Ó211211 ă227259 ó243243
1328222 1488221 ¤164164 ´180180 Ä196196 Ô212212 ä228228 ô244244
1338230 1498226 ¥165165 µ181181 Å197197 Ơ213416 å229229 ơ245417
1348224 1508211 ¦166166 182182 Æ198198 Ö214214 æ230230 ö246246
1358225 1518212 §167167 ·183183 Ç199199 ×215215 ç231231 ÷247247
ˆ136710 ˜152732 ¨168168 ¸184184 È200200 Ø216216 è232232 ø248248
1378240 1538482 ©169169 ¹185185 É201201 Ù217217 é233233 ù249249
Š138138 š154154 ª170170 º186186 Ê202202 Ú218218 ê234234 ú250250
1398249 1558250 «171171 »187187 Ë203203 Û219219 ë235235 û251251
Œ140338 œ156339 ¬172172 ¼188188 ̀204768 Ü220220 ́236769 ü252252
141141 157157 ­173173 ½189189 Í205205 Ư221431 í237237 ư253432
Ž142142 ž158158 ®174174 ¾190190 Î206206 ̃222771 î238238 2548363
143143 Ÿ159376 ¯175175 ¿191191 Ï207207 ß223223 ï239239 ÿ255255
Asc/AscW values in codepage 874 MS-DOS Thai
chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW chAscAscW
1288364 144144  160160 1763600 1923616 2083632 2243648 2403664
129129 1458216 1613585 1773601 1933617 2093633 2253649 2413665
130130 1468217 1623586 1783602 1943618 2103634 2263650 2423666
ƒ131131 1478220 1633587 1793603 1953619 2113635 2273651 2433667
132132 1488221 1643588 1803604 1963620 2123636 2283652 2443668
1338230 1498226 1653589 1813605 1973621 2133637 2293653 2453669
134134 1508211 1663590 1823606 1983622 2143638 2303654 2463670
135135 1518212 1673591 1833607 1993623 2153639 2313655 2473671
ˆ136136 ˜152152 1683592 1843608 2003624 2163640 2323656 2483672
137137 153153 1693593 1853609 2013625 2173641 2333657 2493673
Š138138 š154154 1703594 1863610 2023626 2183642 2343658 2503674
139139 155155 1713595 1873611 2033627 219-1855 2353659 2513675
Œ140140 œ156156 1723596 1883612 2043628 220-1854 2363660 252-1851
141141 157157 1733597 1893613 2053629 221-1853 2373661 253-1850
Ž142142 ž158158 1743598 1903614 2063630 222-1852 2383662 254-1849
143143 Ÿ159159 1753599 1913615 2073631 ฿2233647 2393663 255-1848

All the characters might not show up with old browsers or operating systems. Most codepages contain a few unused slots. Such slots may show up as a rectangle, a specific character or a regular character. These tables are intended for a general understanding of how VB6 works. Because of the unused slots, do not use these tables as a reliable source for converters. For an accurate definition of code pages, please see a code page reference. Chinese, Japanese and Korean codepages have been left out as they use double-byte codes exceeding the 128-255 range.

Converting Asc to AscW and Chr to ChrW

Generally speaking, AscW and ChrW are safer to use than Asc and Chr. AscW and ChrW will perform the same everywhere. Asc and Chr, on the other hand, run differently in different locales. It thus makes sense to use AscW and ChrW for both optimization and internationalization purposes. The good news is that VB6 is fully capable of using all values of AscW and ChrW regardless of the locale. VB6 can handle Russian characters in USA and Hebrew in Greece and there's nothing extra the user needs to install. Displaying, outputting and inputting strange characters may require tricks, but internally VB6 handles all characters just fine. (Internationalization is beyond the scope of this article.)

The problem with routinely converting Asc to AscW and Chr to ChrW is that your code may change in a subtle way, causing new bugs to be inserted. There are two kinds of bugs to expect:

  • Bug 1. Asc and AscW return different values, and so do Chr and ChrW. You can get an unexpected value or character after conversion.
  • Bug 2. AscW returns a full range of values from -32768 to 32767. For single-byte codepages (i.e. not Korean/Chinese/Japanese), Asc returns values 0 to 255. Thus, many pieces of code expect only 0-255. Make sure your code can deal with negative values and also values exceeding 255. You must use the Integer or Long datatype to store the return value of AscW, whereas your code may have run nicely storing Asc values in a Byte.

If you are working with Latin I codepage (1252), as many VB6 developers are, the problem characters you need to be aware of are Ansi 128-159. Within this range Asc differs from AscW and Chr differs from ChrW. In ranges 0-127 and 160-255 Unicode equals Ansi. For character values within those ranges converting Asc to AscW and Chr to ChrW should be straightforward and safe, and only make your program more international and faster.

Here are a few examples of how your code may go wrong:

  • Asc("€")=128 everywhere else but in the Cyrillic codepage, where it is 136. Best use AscW("€")=8364 everywhere.
  • Asc("")=189 in the Latin I codepage and some others, but in the Central European codepage, 189 represents the ˝ character. Best use AscW("")=189 everywhere.
  • Chr$(223)="ß" in the Latin I codepage, but not in most others. Better use ChrW$(223) to get "ß".
  • Testing whether there is a pound in s="" succeeds with Asc(s)=163 in many locales. However, it fails in the Central European codepage. Better test with AscW(s)=163.

As you can see, it's a really good idea to use the Unicode versions, but you must know what you're doing.

Caveat with extra parentheses

When passing strings as an argument in a procedure call, an extra pair of parentheses can lead to making an unnecessary copy of the string. Unfortunately VB6's syntax is somewhat tricky about when parentheses are required and when not. Sometimes parentheses are required while sometimes they are too much.

Consider the following procedure that takes a string parameter by reference. In VB6 there are 2 ways to declare a reference parameter, so we have provided two syntax examples for the same thing:

Sub Process(ByRef s As String)  ' Preferred syntax
Sub Process(s As String)        ' Alternative syntax

Let's further assume the procedure doesn't modify s, but only reads its value. The purpose of ByRef (instead of ByVal) is to avoid making an unnecessary copy of s. So far so good. This looks like optimal coding.

Now, is this the correct way to call the Sub?

Process (s)

No! That's bad! The parentheses around (s) are extra in VB6. While they are required in VB.NET and a bunch of other programming languages, in VB6 you can (and should) do without them. Here is the correct way:

Process s

The difference is that s is passed by reference, while (s) makes a copy of s.

The syntax is different when calling a function to get its return value. If Process is a Function, the correct, optimal way to call it is this:

x = Process(s)

To make a copy of s, you need to add an extra pair of parentheses:

x = Process((s))

If you use the obsolete Call keyword, the correct syntax is:

Call Process(s)

To make a copy of s, you need to add an extra pair of parentheses:

Call Process((s))

Tricky, isn't it!

Summary of string optimization rules

The following table summarizes the optimization rules presented above.

String optimization rules, Part III
SlowFastWhen
Left$(s, n)sn>=Len(s)
Mid$(s, 1)s
Mid$(s, 1, n)sn>=Len(s)
Mid$(s, 1, n)Left$(s, n)n<Len(s)
Mid$(s, x, n)Right$(s, n)need end-of-string (note bug risk)
Mid$(s, x)Mid$(s, x, n)need middle-of-string, can truncate
Right$(s, n)sn>=Len(s)
AscW(Left$(s, n))AscW(s)
AscW(Mid$(s, x))AscW(Mid$(s, x, 1)
AscW(Right$(s, n))AscW(Mid$(s, Len(s) - n + 1, 1))n>1
Asc(s)AscW(s)return value in range 0..127
Chr$(i)ChrW$(i)i in range 0..127
Process (s)Process spass s by reference

Part I | Part II | Part III

Optimize string handling in VB6 - Part III
URN:NBN:fi-fe201003011417

©Aivosto Oy -