开发者

HTML code strip regexp problem

In javascript, one of the popular regex is to strip out HTML tags from the text. The code for that is

String.prototype.stripHTML = function () { 
             var reTag = /<(?:.|\s)*?>/g; 
             return this.replace(reTag, "");
        };

If you try this on "<b>This would be bold</b>".stripHTML(), then it outputs as "This would be bold". Shouldn't it output as "" ?

Doesn't this regex says that match eve开发者_高级运维rything which starts with < and ends with > ? Why didn't this regex start at < of <b> and end at > of </b>


You are using a non-greedy modifier.

(?:.|\s)*?
         ^

This causes the match to be the shortest possible, instead of the default which is to match the longest possible match.

<b>This would be bold</b>
^-^                  ^--^     Non-greedy: <(?:.|\s)*?>
^-----------------------^     Greedy    : <(?:.|\s)*>


Yes, but the *? performs an ungreedy match (short match):

var reTag = /<(?:.|\s)*?>/g; 

To perform reedy match (longest match possible), remove the ?:

var reTag = /<(?:.|\s)*>/g; 


It's not a greedy regex, meaning that it matches the first > it comes across, the <b> and </b> are separate matches.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜