开发者

Pattern matching with Chinese characters (encoded in UTF-8) in Java

I need to check whether a Chinese province is contained within an address in Chinese.

开发者_StackOverflow

I am able to read and write Chinese characters easily.

I tried to use the indexOf() method of String to check whether a province (e.g. 广东) is contained within an address (中国 广东). However, this always returns -1.

When I try to check for numbers (e.g. whether 103 is contained within 9910399) it works fine.

Do I need to do something different to handle UTF-8 string matching? Thanks. Matt


I have just tried your example and although I do not have Chineese fonts on my system, so the characters are not displayed correctly indexOf() works fine for me.

So, check encoding of your source files (*.java). For example if you are using eclipse check it under Window/Preferences/General/Workspace/Text file Encoding. I am using UTF-8.

The second think is the encoding used by java compiler. In case of eclipse you do not have to say anything. I think that for javac you probably should explicitely set encoding using -encoding. Otherwise the default OS encoding will be probably used.

Good luck.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜