Main Page

Non-capturing groups

Non-capturing groups
Groups that create backreferences are called
capturing groups
. There are also
non-capturing groups
, which
don’t create backreferences. In very long regular expressions, storing backreferences slows down the
matching process. By using non-capturing groups, you can have the same flexibility to match sequences
of characters without incurring the overhead of storing the results.
If you want to create a non-capturing group, just add a question mark followed by a colon immediately
after the opening parenthesis:
var sToMatch = “#123456789”;
var reNumbers = /#(?:\d+)/;
reNumbers.test(sToMatch);
alert(RegExp.$1); //outputs “”
The last line of this example outputs an empty string because the group is specified as non-capturing.
Because of this, no backreferences can be used with the
replace()
method, accessed via the
RegExp.$x
variables, or used in the regular expression itself. Look what happens when the following code is run:
var sToMatch = “#123456789”;
var reNumbers = /#(?:\d+)/;
alert(sToMatch.replace(reNumbers, “abcd$1”)); //outputs “abcd$1”
This code outputs
“abcd$1”
instead of
“abcd123456789”
because the
“$1”
code isn’t recognized as a
backreference; instead, it is interpreted literally.
One very popular use of regular expressions is to strip HTML tags out of text. This is typically used on
discussion boards and forums to prevent visitors from including malicious or careless HTML in their
postings. To strip HTML tags using regular expressions is trivial; you just need one simple expression:
var reTag = /<(?:.|\s)*?>/g;
This expression matches a less-than symbol (
<
) followed by any text (specified in a non-capturing group),
followed by a greater-than symbol (
>
), which effectively matches all HTML tags. The non-capturing
group is used in this case because it doesn’t matter what appears between the less-than and greater-than
symbols (it all must be removed). You can use the
replace()
method with this pattern to create your
own
stripHTML()
method for a String:
String.prototype.stripHTML = function () {
var reTag = /<(?:.|\s)*?>/g;
return this.replace(reTag, “”);
};
To use this method is equally simple:
var sTest = “<b>This would be bold</b>”;
alert(sTest.stripHTML()); //outputs “This would be bold”
209
Regular Expressions
10_579088 ch07.qxd 3/28/05 11:38 AM Page 209


JavaScript EditorFree JavaScript Editor     Ajax Editor


©