Regular expression syntax rules in RegExpr

Back to help contents

Repetitions (Quantifiers)

x*Zero or more x's (greedy, takes as many as possible)
x*?Zero or more x's (stingy, takes as few as possible)
x+One or more x's (greedy, takes as many as possible)
x+?One or more x's (stingy, takes as few as possible)
x?One or zero x (greedy, try one first)
x??Zero or one x (stingy, try zero first)
x{n}n x's
x{m,n}At most n and at least m x's (greedy, takes as many as possible)
x{m,n}?At least m and at most n x's (stingy, takes as few as possible)
x{n,}At least n x's (greedy, takes as many as possible)
x{n,}?At least n x's (stingy, takes as few as possible)

See more about greedy and stingy matching.

Grouping

(expression)
Use parentheses to group things together for use with operators * + ? | and to remember matched patterns (see Subexpressions).

(?:expression)
Same as (expression), but doesn't create a backreference (matched pattern) like (expression) does.

Single characters

.Any single character except a newline. Exception: When reSingleLine is set, accepts a newline as well.
\nA newline. Depending on the setting of Const NewLine, one of:
ASCII 13 & ASCII 10 (default),
ASCII 13, or
ASCII 10
\rA return (ASCII 13)
\fA form feed (ASCII 10)
\tA tab (ASCII 9)
\aAlarm bell (ASCII 7)
\bBackspace character (ASCII 8, inside [ ] only)
\eEscape character (ASCII 27)
\cAControl character. Examples: \cA = Ctrl-A (ASCII 1), \cZ = Ctrl-Z (ASCII 26)
\wAny alphanumeric (word) character. By default, the same as [a-zA-Z0-9_¡-ÿ], that is, underscore, all numbers and Unicode characters 00A1-00FF. Note that the range 00A1-00FF includes some punctuation characters and some extended Latin letters.
If you want to use the Perl default [a-zA-Z0-9_] without Unicode 00A1-00FF, set #Const ExtendedCharacters = False in (declarations) of RegExpr.Bas.
\WAny non-word character. The same as [^\w]
\dAny digit. The same as [0-9]
\DAny non-digit. The same as [^0-9]
\sAny whitespace character: space, tab, form feed, return, or newline.
\SAny non-whitespace character
\x##Unicode (ASCII) ## in hexadecimal. Example: \x40 matches @
\0###Unicode (ASCII) ## in octal. Example: \0100 matches @, \0 matches the null character (ASCII 0)
\Escape character, used to match special characters. Because the characters + * ? . $ ^ | \ [ ] ( ) { } have a special meaning in regular expressions, you must precede them with a backslash \ to match themselves . Examples: \$ matches $, \( matches (, \\ matches \ etc.
\QQuote. Disable special characters until \E. Example: \Q*.*\E matches "*.*" but nothing else.

Zero-width assertions

Zero-width assertions don't consume the text they match.

\ABeginning of string
\ZEnd of string, or before newline at end-of-string (newline at end-of-string remains unmatched)
\bA word boundary, outside [] only. A word boundary (\b) is defined as a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order). Start and end of string count as a \W.
\b[a-z]+\b matches any lower-case word.
\BNo word boundary, outside [] only.
\b[Tt]he\Bmatches any word beginning The or the that is at least 4 characters long.

The assertions ^ and $ depend on the reMultiline flag. The default is:

^Beginning of string
$End of string, or before newline at end-of-string (newline at end-of-string remains unmatched)

If flag reMultiline is set:

^Beginning of string or line
$End of string or line

Lookaheads

Lookaheads are zero-width assertions that ensure that what follows must or must not match a given regular expression. Lookaheads don't consume the input.

(?=expression)
What follows must match expression.

Example: /\w+(?=\t)/ matches a word followed by a tab.

(?!expression)
What follows must not match expression.

Example: /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar".

Note that there are no lookbehinds. /(?!foo)bar/ will not find an occurrence of "bar" that is preceded by something which is not "foo". That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match.

Backreferences

\1Match the first subexpression. For example, "(\w+) \1" matches any repeated word with a space between them.
\2...\9Match the second etc. subexpression. At most 9 subexpressions can be matched like this.

Backreferences work only after the subexpression has been matched, that is, a backreference must be located after the corresponding ( )

Square brackets [character classes]

Square brackets are used to match any one of the characters inside them. [abc] matches any of a, b, or c.

You can also use + ? * after square brackets:
[abc]+ matches any combination of a, b and c.

A hyphen indicates "between" in ASCII order: [a-c]
Don't use \n in "between" conditions, use \r and \f instead.

A carat at the beginning means "not": [^d-z]

You can also use most special expressions inside []. However, the expressions + * ? . $ \B | ( ) \1 don't have any special meaning inside [ ]. In addition, \b is ASCII 8, not a word boundary.

If you want to match ], ^ or - inside square brackets, use the escape character: \], \^ or \-.

Or (alternatives)

A vertical bar | represents an or operator. Parentheses (...) can be used to group things together:

Jesse|Peter|SamuelAny of Jesse, Peter, and Samuel.
(0|1)+Any string of 0's and 1's.

Miscellaneous

(?#Text)
A comment that is ignored.

Left to right. Regular expressions always take the first string that matches, starting from the left. Out of or'ed expressions, the leftmost one is tried first.

RegExpr uses Unicode strings. Character values and ranges are expressed in Unicode. In VB, the functions ChrW and AscW are compatible with RegExpr, while Chr and Asc are not.

For more information

Tutorial
VB functions in RegExpr

©Aivosto Oy