开发者

JavaScript Regular Expression not matching <a> tags

I am trying to match URLs with a tested Regex expression but when I use JavaScript to evaluate it returns false.

Here is my code:

var $regex = new RegExp("<a\shref=\"(\#\d+|(https?|ftp):\/\/[-a-z0-9+&@#\/%?=~_|!:,.;\\(\\)]+)\"(\stitle=\"[^\"<>]+\")?\s?>|<\/a>");

var $test = new Array();
$test[0] = '<a href="http://www.nytimes.com/imagepages/2010/09/02/us/HURRICANE.html">';
$test[1] = '<a href="http://www.msnbc.msn.com/id/38877306/ns/weather/%29;">';
$test[2] = '<a href="http://www.msnbc开发者_如何学JAVA.msn.com/id/38927104" title="dd" alt="dd">';
for(var i = 0; i < $test.length; i++)
{
    console.log($test[i]);
    console.log($regex.test($test[i]));
}

Anyone have any idea what is going on?


You need to escape backslashes when creating regular expressions with new RegExp() since you pass a string and a backslash is also an escaping character for strings.

new RegExp("\s"); // becomes /s/
new RegExp("\\s"); // becomes /\s/

Or just write your regexp as literals.

var re = /\s/;

Also, if you want to match URL's, why take a whole HTML tag into account? The following regexp would suffice:

var urlReg = /^(?:\#\dhttp|ftp):\/\/[\w\d\.-_]*\/[^\s]*/i;
// anything past the third / that's not a space, is valid.


There are multiple problems.

You need to escape backslashes. Any character with a special meaning needs to be escaped with a backslash in the regular expression, and the backslash itself needs to be escaped in the string. Effectively, \s should be represented as \\s if you construct it with new Regexp("\\s").

You need to allow more characters in your URLs. Currently you don't even allow / characters. I would propose a character class like [^"] to match everything after http://. (Escaping the " character when used in t a string will make it [^\"].

You're not taking alt attributes into account. You only match title attributes, not alt attributes.

A working example:

// Ditch new Regex("...") in favour of /.../ because it is simpler.
var $regex = /<a\shref="(#\d+|(https?|ftp):\/\/[^"]+)"(\stitle="[^"]+")?(\salt="[^"]+")?|<\/a>/;

var $test = new Array();
$test[0] = '<a href="http://www.nytimes.com/imagepages/2010/09/02/us/HURRICANE.html">';
$test[1] = '<a href="http://www.msnbc.msn.com/id/38877306/ns/weather/%29;">';
$test[2] = '<a href="http://www.msnbc.msn.com/id/38927104" title="dd" alt="dd">';
for(var i = 0; i < $test.length; i++)
{
    console.log($test[i]);
    console.log($regex.test($test[i]));
}

All three examples match this regex.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜