Java encoding for Japanese characters
I have a file name with Japanese characters. file name: S-最終条件.pdf
. In Java, file name: S-最終条件.pdf
.
// Support for Japanese file name
fileNameX = new String(fileName.getBytes("Shift_JIS"),"ISO8859_1");
The output fileNameX
is coming out S?最終条件.pdf
. Hence it is throwing an error. I am trying to outstream the file in PDF format, but the particular Japanese character "-" is not recognised and it is throwing error while streaming.
Plea开发者_StackOverflowse help me solve this issue.
Thanks, PrasannaLet's see what your code actually does:
//Assign to bytes the UTF-16 String fileName Encoded in Shift_JIS
//bytes now contains the binary Shift_JIS representation of your String
final byte[] bytes = fileName.getBytes("Shift_JIS");
//Create a new String UTF-16 by interpreting bytes as ISO8859_1
//Takes the Shift_JIS encoded bytes and interprets it as ISO8859_1
new String(bytes,"ISO8859_1");
Java strings use UTF-16 for their internal representation. You cannot specify a target encoding when you create a string as UTF-16 is fixed, you have to Specify the correct source encoding which is "Shift_JIS" for the bytes array.
The fileNameX should come out correct without converting.
This is the mapping problem both Shift_JIS code and Unicode. Shift_JIS doesn't have all the characters of Unicode so some characters become "?".
Following is the result of conversion from Unicode to Shift_JIS.
RESULT UNICODE
[NG] U+2012 (FIGURE DASH)
[NG] U+2013 (EN DASH)
<OK> U+2014 (EM DASH)
[NG] U+2015 (HORIZONTAL BAR)
<OK> U+2212 (MINUS SIGN)
[NG] U+FF0D (FULLWIDTH HYPHEN-MINUS)
One solution is a replacement of the code.
U+2012,U+2013,U+2015 --> U+2014
U+FF0D --> U+2212
The Answers by @josefx and @Yu Sun corn are both collect.
First, as @josefx answered, when you want the Shift JIS representation of a string and reverse it to a String
object, you have to pass the same encoding to String#getBytes(String charsetName)
and the constructor String(byte[] bytes, String charsetName)
.
Second, you have to use Windows-31J
instead of Shift_JIS
as the encoding name. The encoding scheme of Windows-31J
and Shift_JIS
are the same, but the character set is slightly different: Windows-31J has some additional characters (Note that Windows-31J in Windows document is called "Shift JIS". So in most cases, you should use Windows-31J
when you want to use Shift JIS). As @Yu Sun corn answered, the string "S-最終条件.pdf"
contains a character that is not contained in the character set of Shift JIS: -
. The character set of Windows-31J contains this character.
Finally, the code you should use will be like this:
// Get the byte-stream representation of Japanese characters in Windows-31J encoding.
// Windows-31J (aka MS932) is the default encoding when you run Java VM in Windows with Japanese locale.
byte [] textBytes = name.getBytes("Windows-31J");
// Reverse byte-stream representation to a String object
System.out.println(new String(textBytes, "Windows-31J"));
精彩评论