Java encoding for Japanese characters

2023-02-14 07:51 问答作者：

I have a file name with Japanese characters. file name: S－最終条件.pdf. In Java, file name: S－最終条件.pdf.

// Support for Japanese file name
fileNameX = new String(fileName.getBytes("Shift_JIS"),"ISO8859_1");

The output fileNameX is coming out S?最終条件.pdf. Hence it is throwing an error. I am trying to outstream the file in PDF format, but the particular Japanese character "－" is not recognised and it is throwing error while streaming.

Plea开发者_StackOverflowse help me solve this issue.

Thanks, Prasanna

Let's see what your code actually does:

//Assign to bytes the UTF-16 String fileName Encoded in Shift_JIS
//bytes now contains the binary Shift_JIS representation of your String
final byte[] bytes = fileName.getBytes("Shift_JIS");

//Create a new String UTF-16 by interpreting bytes as ISO8859_1
//Takes the Shift_JIS encoded bytes and interprets it as ISO8859_1
new String(bytes,"ISO8859_1");

Java strings use UTF-16 for their internal representation. You cannot specify a target encoding when you create a string as UTF-16 is fixed, you have to Specify the correct source encoding which is "Shift_JIS" for the bytes array.

The fileNameX should come out correct without converting.

This is the mapping problem both Shift_JIS code and Unicode. Shift_JIS doesn't have all the characters of Unicode so some characters become "?".

Following is the result of conversion from Unicode to Shift_JIS.

RESULT  UNICODE
[NG]    U+2012 (FIGURE DASH)
[NG]    U+2013 (EN DASH)
<OK>    U+2014 (EM DASH)
[NG]    U+2015 (HORIZONTAL BAR)
<OK>    U+2212 (MINUS SIGN)
[NG]    U+FF0D (FULLWIDTH HYPHEN-MINUS)

One solution is a replacement of the code.

U+2012,U+2013,U+2015 --> U+2014
U+FF0D               --> U+2212

The Answers by @josefx and @Yu Sun corn are both collect.

First, as @josefx answered, when you want the Shift JIS representation of a string and reverse it to a String object, you have to pass the same encoding to String#getBytes(String charsetName) and the constructor String(byte[] bytes, String charsetName).

Second, you have to use Windows-31J instead of Shift_JIS as the encoding name. The encoding scheme of Windows-31J and Shift_JIS are the same, but the character set is slightly different: Windows-31J has some additional characters (Note that Windows-31J in Windows document is called "Shift JIS". So in most cases, you should use Windows-31J when you want to use Shift JIS). As @Yu Sun corn answered, the string "S－最終条件.pdf" contains a character that is not contained in the character set of Shift JIS: －. The character set of Windows-31J contains this character.

Finally, the code you should use will be like this:

// Get the byte-stream representation of Japanese characters in Windows-31J encoding.
// Windows-31J (aka MS932) is the default encoding when you run Java VM in Windows with Japanese locale.
byte [] textBytes = name.getBytes("Windows-31J");

// Reverse byte-stream representation to a String object
System.out.println(new String(textBytes, "Windows-31J"));

继续阅读：cjk unicode utf-8

Java encoding for Japanese characters

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？