JavaScript Regular Expression not matching <a> tags
I am trying to match URLs with a tested Regex expression but when I use JavaScript to evaluate it returns false.
Here is my code:
var $regex = new RegExp("<a\shref=\"(\#\d+|(https?|ftp):\/\/[-a-z0-9+&@#\/%?=~_|!:,.;\\(\\)]+)\"(\stitle=\"[^\"<>]+\")?\s?>|<\/a>");
var $test = new Array();
$test[0] = '<a href="http://www.nytimes.com/imagepages/2010/09/02/us/HURRICANE.html">';
$test[1] = '<a href="http://www.msnbc.msn.com/id/38877306/ns/weather/%29;">';
$test[2] = '<a href="http://www.msnbc开发者_如何学JAVA.msn.com/id/38927104" title="dd" alt="dd">';
for(var i = 0; i < $test.length; i++)
{
console.log($test[i]);
console.log($regex.test($test[i]));
}
Anyone have any idea what is going on?
You need to escape backslashes when creating regular expressions with new RegExp()
since you pass a string and a backslash is also an escaping character for strings.
new RegExp("\s"); // becomes /s/
new RegExp("\\s"); // becomes /\s/
Or just write your regexp as literals.
var re = /\s/;
Also, if you want to match URL's, why take a whole HTML tag into account? The following regexp would suffice:
var urlReg = /^(?:\#\dhttp|ftp):\/\/[\w\d\.-_]*\/[^\s]*/i;
// anything past the third / that's not a space, is valid.
There are multiple problems.
You need to escape backslashes. Any character with a special meaning needs to be escaped with a backslash in the regular expression, and the backslash itself needs to be escaped in the string. Effectively, \s
should be represented as \\s
if you construct it with new Regexp("\\s")
.
You need to allow more characters in your URLs. Currently you don't even allow /
characters. I would propose a character class like [^"]
to match everything after http://
. (Escaping the "
character when used in t a string will make it [^\"]
.
You're not taking alt
attributes into account. You only match title
attributes, not alt
attributes.
A working example:
// Ditch new Regex("...") in favour of /.../ because it is simpler.
var $regex = /<a\shref="(#\d+|(https?|ftp):\/\/[^"]+)"(\stitle="[^"]+")?(\salt="[^"]+")?|<\/a>/;
var $test = new Array();
$test[0] = '<a href="http://www.nytimes.com/imagepages/2010/09/02/us/HURRICANE.html">';
$test[1] = '<a href="http://www.msnbc.msn.com/id/38877306/ns/weather/%29;">';
$test[2] = '<a href="http://www.msnbc.msn.com/id/38927104" title="dd" alt="dd">';
for(var i = 0; i < $test.length; i++)
{
console.log($test[i]);
console.log($regex.test($test[i]));
}
All three examples match this regex.
精彩评论