Japanese Character Encoding in Java

2023-04-11 11:35 问答作者：

Here's my problem. I'm now using using Java Apache POI to read an Excel (.xls or .xlsx) file, and display the contents. There are some Japanese chars in the spreadsheet and all of the Japanese chars I got are "???" in my output. I tried to use Shift-JIS, UTF-8 and many other encoding ways, but it doesn't work... Here's my encoding code below:

public String encoding(String str) throws UnsupportedEncodingException{
  String Encoding = "Shift_JIS";
  return this.changeCharset(str, Encoding);
}
public String changeCharset(String str, String newCharset) throws UnsupportedEncodingException {
  if (str != null) {
    byte[] bs = str.getBytes();
    return new String(bs, newCharset);
  }
  return null;
}

I am passing in every string I got to encoding(str). But when I print the return value, it's still something like "???" (Like below) but not Japanese characters (Hiragana, Katakana 开发者_JS百科or Kanji).

title-jp=???

Anyone can help me with this? Thank you so much.

Your changeCharset method seems strange. String objects in Java are best thought of as not have a specific character set. They use Unicode and so can represent all characters, not only one regional subset. Your method says: turn the string into bytes using my system's character set (whatever that may be), and then try and interpret those bytes using some other character set (specified in newCharset), which therefore probably won't work. If you convert to bytes in an encoding, you should read those bytes with the same encoding.

Update:

To convert a String to Shift-JIS (a regional encoding commonly used in Japan) you can say:

byte[] jis = str.getBytes("Shift_JIS");

If you write those bytes into a file, and then open the file in Notepad on a Windows computer where the regional settings are all Japan-centric, Notepad will display it in Japanese (having nothing else to go on, it will assume the text is in the system's local encoding).

However, you could equally well save it as UTF-8 (prefixed with the 3-byte UTF-8 introducer sequence) and Notepad will also display it as Japanese. Shift-JIS is only one way of representing Japanese text as bytes.

I suspect you shouldn't be doing this in the first place. If it really is Apache POI's fault, then you'll need to get the original raw bytes from the data, not just use the system default encdoing.

On the other hand, I think it's entirely likely that Apache POI has managed to do the right thing, and it's just an output problem. I suggest you dump the original string you've got (removing your encoding method entirely) in terms of its Unicode code points, e.g.

 for (int i = 0; i < text.length; i++) {
     System.out.println("U+" + Integer.toHexString(text.charAt(i));
 }

Then check those Unicode values against the ones at the Unicode web site.

继续阅读：cjk unicode

Japanese Character Encoding in Java

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？