Regular expressions - An introduction
Regular expressions are a pattern matching standard for string parsing and replacement. They are used on a wide range of platforms and programming environments. Originally missing in Visual Basic, regular expressions are now available for most VB and VBA versions.
Regular expressions, or regexes for short, are a way to match text with patterns. They are a powerful way to find and replace strings that take a defined format. For example, regular expressions can be used to parse dates, urls and email addresses, log files, configuration files, command line switches or programming scripts.
Since regexes are language independent, we're trying to keep this article as language independent as possible. However, it's to be noted that not all regex implementations are the same. The below text is based on Perl 5.0. This is also the format that RegExpr for VB/VBA uses. Some implementations may not handle all expressions the same way.
In it's simplest form, a regular expression is a string of symbols to match "as is".
That's not very impressive yet. But you can see that regexes match the first case found, once, anywhere in the input string.
So what if you want to match several characters? You need to use a quantifier. The most important quantifiers are
Quantifiers take the preceding character as argument and attempt to match it zero, one or more times. So,
By default, regexes are greedy. They take as many characters as possible. In the next example, you can see that the regex matches as many 2's as there are.
There is also stingy matching available that matches as few characters as possible, but let's leave it this time. There are also more quantifiers than those mentioned, but we're not going any deeper into that in this introduction.
A lot of special characters are available for regex building. Here are some of the more usual ones.
Here are some likely uses for the special characters.
You can group characters by putting them between square brackets. This way, any character in the class will match one character in the input.
Here are some sample uses.
Grouping and alternatives
It's often necessary to group things together with parentheses
With parentheses, you can also define subexpressions to remember after the match has happened. In the below example, the part of the string that matches between the parentheses
In these examples, what was matched by
So are regexes case sensitive? Yes, by default they are. This means
You can also run a case insensitive match. In it,
The above is in no way a complete description of regexes. There are more ways to write them, more special characters, and more quantifiers available. What's available depends also on the implementation. Some regex engines don't implement all of the possibilities, rendering them not so usable for every purpose. In case you're interested in learning a more complete set of regexes, see the help file of RegExpr for VB/VBA. It's available for free download.
Here are a few practical examples of regular expressions. They are provided for learning purposes. In real applications, you should carefully design your regexes to match the exact use.
It's often necessary to check if a string is an email address or not. Here's one way to do it.
This example works for most cases but is not written based on any standard. It may accept non-working email addresses and reject working ones. Fine-tuning is required.
Date strings are difficult to parse because there are so many variations. You can't always trust VB's own date conversion functions as the date may come in an unexpected or unsupported format. Let's assume we have a date string in the following format:
If a match is found, you can be sure that the input string is formatted like a date. However, a regex is not able to verify that it's a real date. For example, it could as well be
What if the user gives
Web server logs come in several formats. This is a typical line in a log file.
As you can see, there are several fields on the line. They conform to a complex format. The fields are different from each other. A human-readable way to define the various fields is here:
As you can see, there are fields such as host (visitor's Internet address), date and time (enclosed in square brackets), an HTTP request with file to retrieve (enclosed in quotation marks), numeric status code, numeric size of file, referer field (enclosed in quotation marks), and agent or browser name (enclosed in quotation marks).
To programmatically parse the line, you need to split it into fields, then look at each field. This is a sample regex that will split the fields.
In this example, we've left "agent" unmatched. That does no harm because
This example has been taken from a web log file parser script. To use it in your own code, you have to fine-tune it to suit your log file format. The regex assumes that lines come only in the presented format. If, say, a field is missing or the file contains garbage lines, the regex may fail. This results in an unparsed line.
Regular expressions in Visual Basic
Earlier Visual Basic versions (from 1.0 to 6.0) didn't come with regular expressions. Neither did VBA. The .NET framework has regular expressions available.
For non-.NET programming, VB developers have to use a 3rd-party solution. Aivosto RegExpr is a solution that adds comprehensive support for regular expressions. Available as a pure source code module, it is an ideal way to add regexes to Visual Basic 5.0, 6.0 and VBA without any additional run-time file requirements.