开发者

Detecting International Characters In Regular Expressions

Here's a regular expression to detect product pages on amazon. It works for pages in standard english but not for url's with international characters. So URL2 is not detected. How do I get around this? Thanks.

var URL1 = "www.amazon.com/Big-Short开发者_运维问答-Inside-Doomsday-Machine/dp/0393338827/";
var URL2 = "www.amazon.fr/Larm%C3%A9e-furieuse-Fred-Vargas/dp/2878583760/";

var regex1 = RegExp("http://www.amazon.(com|co.uk|de|ca|it|fr|cn|co.jp)/([\\w-]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
m = URL1.match(regex1);


% doesn't match \w, so Larm%C3%A9e-furieuse-Fred-Vargas doesn't match [\w-]+. Why not just use [^/]+?

PS — "." matches any character, so you should use pattern \., which would appear as \\. in the literal.

RegExp("http://www\\.amazon\\.(ca|cn|co\\.(jp|uk)|com|de|fr|it)/([^/]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜