HTML code strip regexp problem
In javascript, one of the popular regex is to strip out HTML tags from the text. The code for that is
String.prototype.stripHTML = function () {
var reTag = /<(?:.|\s)*?>/g;
return this.replace(reTag, "");
};
If you try this on "<b>This would be bold</b>".stripHTML()
, then it outputs as "This would be bold"
. Shouldn't it output as ""
?
Doesn't this regex says that match eve开发者_高级运维rything which starts with <
and ends with >
? Why didn't this regex start at <
of <b>
and end at >
of </b>
You are using a non-greedy modifier.
(?:.|\s)*?
^
This causes the match to be the shortest possible, instead of the default which is to match the longest possible match.
<b>This would be bold</b> ^-^ ^--^ Non-greedy: <(?:.|\s)*?> ^-----------------------^ Greedy : <(?:.|\s)*>
Yes, but the *?
performs an ungreedy match (short match):
var reTag = /<(?:.|\s)*?>/g;
To perform reedy match (longest match possible), remove the ?
:
var reTag = /<(?:.|\s)*>/g;
It's not a greedy regex, meaning that it matches the first >
it comes across, the <b>
and </b>
are separate matches.
精彩评论