Pattern matching with Chinese characters (encoded in UTF-8) in Java
I need to check whether a Chinese province is contained within an address in Chinese.
开发者_StackOverflowI am able to read and write Chinese characters easily.
I tried to use the indexOf() method of String to check whether a province (e.g. 广东) is contained within an address (中国 广东). However, this always returns -1.
When I try to check for numbers (e.g. whether 103 is contained within 9910399) it works fine.
Do I need to do something different to handle UTF-8 string matching? Thanks. Matt
I have just tried your example and although I do not have Chineese fonts on my system, so the characters are not displayed correctly indexOf() works fine for me.
So, check encoding of your source files (*.java). For example if you are using eclipse check it under Window/Preferences/General/Workspace/Text file Encoding. I am using UTF-8.
The second think is the encoding used by java compiler. In case of eclipse you do not have to say anything. I think that for javac
you probably should explicitely set encoding using -encoding
. Otherwise the default OS encoding will be probably used.
Good luck.
精彩评论