开发者

javascript Regex unicode help

in a JavaScript, i am using Regex to split(/\W+/) to words.

when i split this, it's returning wrong value

var s3 = "bardzo dziękuję";
s3 = s3.split(/\W+/);


[0]: "bardzo"
[1]: "dzi"
[2]: "kuj"

开发者_JAVA百科How to fix this problem? please advice


The regex isn't splitting because it is treating your accented characters as non-word characters.

Use the whitespace special character:-

s3 = s3.split(/\s+/);


In this case, why not just split with whitespace?

s3.split(/\s+/);


You could use CharFunk https://raw.github.com/joelarson4/CharFunk , which handles Unicode fully.

var s3 = "bardzo dziękuję";

function notLetterOrDigit(ch) {
    return !CharFunk.isLetterOrDigit(ch);
}

CharFunk.splitOnMatches(s3, notLetterOrDigit);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜