Regular expression for UTF-8 language
my application also supports punjabi(\u0A00-\u0A7F) i tried following code
Pattern classPattern = Pattern.compile("\u0A00-\u0A7F ");
Matcher classMatcher = classPattern.matcher("ਭਾਸ਼ਾ ਸੰਦ");
if (classMatcher.find()) {
System.ou开发者_如何学Ct.println("yes");
}else{
System.out.println("no");
}
i am getting "no" as output though i provided punjabi charactes in matcher()
any idea why??
Should that pattern be "[\u0A00-\u0A7F ]"
? It looks to me like you're trying to match four characters in a specific order, but give the matcher six characters as input.
[\u0A00-\u0A7F ]*
Without the asterisk, you'll match only a single character. You can replace the *
with +
, then empty strings won't be accepted.
Take a look at the Pattern class JavaDocs. It's extremely useful to get a good and quick understanding of regexes.
because "asdsa " is not punjabi [the pattern you gave will look for pubjabi chars ,a dn abc aren't]
精彩评论