Regular expressions - An introduction
Regular expressions are a pattern matching standard for string parsing and replacement. They are used on a wide range of platforms and programming environments. Originally missing in Visual Basic, regular expressions are now available for most VB and VBA versions.
Regular expressions, or regexes for short, are a way to match text with patterns. They are a powerful way to find and replace strings that take a defined format. For example, regular expressions can be used to parse dates, urls and email addresses, log files, configuration files, command line switches or programming scripts.
Since regexes are language independent, we're trying to keep this article as language independent as possible. However, it's to be noted that not all regex implementations are the same. The below text is based on Perl 5.0. This is also the format that RegExpr for VB/VBA uses. Some implementations may not handle all expressions the same way.
In it's simplest form, a regular expression is a string of symbols to match "as is".
That's not very impressive yet. But you can see that regexes match the first case found, once, anywhere in the input string.
So what if you want to match sevearal characters? You need to use a quantifier. The most important quantifiers are
By default, regexes are greedy. They take as many characters as possible. In the next example, you can see that the regex matches as many 2's as there are.
There is also stingy matching available that matches as few characters as possible, but let's leave it this time. There are also more quantifiers than those mentioned.
A lot of special characters are available for regex building. Here are some of the more usual ones.
Here are some likely uses for the special characters.
You can group characters by putting them between square brackets. This way, any character in the class will match one character in the input.
Here are some sample uses.
Grouping and alternatives
It's often necessary to group things together with parentheses ( and ).
With parentheses, you can also define subexpressions to remember after the match has happened. In the below example, the string what is between (.)
In these examples, what is matched by (\d+) gets stored. The regex engine will allow you to retrieve the stored value by a successive call. The implementation of the call varies. In RegExpr for VB/VBA, you call RegExprResult(1) to get the first stored value, RegExprResult(2) to get the second one, and so on. This way you can retrieve fields for further processing.
So are regexes case sensitive? Yes and no. They are both. It depends on the way you write the regex call in the programming language. Refer to the documentation of your programming language or regex implementation on how to write the calls.
The above is in no way a complete description of regexes. There are more ways to write them, more special characters, and more quantifiers available. What's available depends also on the implementation. Some regex engines don't implement all of the possibilities, rendering them not so usable for every purpose. In case you're interested in learning a more complete set of regexes, see the help file of RegExpr for VB/VBA. It's available for free download.
Here are a few practical examples of regular expressions. They are provided for learning purposes. In real applications, you should carefully design your regexes to match the exact use.
It's often necessary to check if a string is an email address or not. Here's one way to do it.
This example works for most cases but is not written based on any standard. It may accept non-working email addresses and reject working ones. Fine-tuning is required.
Date strings are difficult to parse because there are so many variations. You can't always trust VB's own date conversion functions as the date may come in an unexpected or unsupported format. Let's assume we have a date string in the following format: 2002-Nov-14.
If a match is found, you can be sure that the input string is formatted like a date. However, a regex is not able to verify that it's a real date. For example, it could as well be 5400-Qui-32. This doesn't look like an acceptable date to most applications. If you want to prepare yourself for the stranger dates, you'll have to write a more limit ing expression:
Web server logs come in several formats. This is a typical line in a log file.
As you can see, there are several fields on the line. They conform to a complex format. The fields are different from each other. A human-readable way to define the various fields is here:
As you can see, there are fields such as host (visitor's Internet address), date and time (enclosed in square brackets), an HTTP request with file to retrieve (enclosed in quotation marks), numeric status code, numeric size of file, referer field (enclosed in quotation marks), and agent (browser) name (enclosed in quotation marks).
To programmatically parse the line, you need to split it into fields, then look at each field. This is a sample regex that will split the fields.
This example has been taken from a web log file parser script. To use it in your own code, you have to fine-tune it to suit your log file format. The regex assumes that lines come only in the presented format. If, say, a field is missing or the file contains garbage lines, the regex may fail. This results in an unparsed line.
Regular expressions in Visual Basic
Earlier Visual Basic versions (from 1.0 to 6.0) didn't come with regular expressions. Neither did VBA. The .NET framework has regular expressions available.
For non-.NET programming, VB developers have to use a 3rd-party solution. Aivosto RegExpr is a solution that adds comprehensive support for regular expressions. Available as a pure source code module, it is an ideal way to add regexes to Visual Basic 5.0, 6.0 and VBA without any additional run-time file requirements.