reading unicode
I'm using java io to retrieve text from a server that might output character such as é. then output it using System.err, they turn out to be '?'. I am using UTF8 encoding. what's wrong? int len = 0;
char[] buffer = new char[1024];
OutputStream os = sock.getOutputStream();
InputStream is = sock.getInputStream();
os.write(query.getBytes("UTF8"));//iso8859_1"));
Reader reader = new InputStreamReader(is, Charset.forName("UTF-8"));
do {
len = reader.read(buffer);
if (len > 0) {
if (outstring == null) {
outstring = new StringBuffer();
}
outstring.append(buffer, 0, len);
}
} while (len > 0);
System.err.println(outstring);
Edit: just tried the following code:
StringBuffer b = new StringBuffer();
for (char c = 'a';开发者_Python百科 c < 'd'; c++) {
b.append(c);
}
b.append('\u00a5'); // Japanese Yen symbol
b.append('\u01FC'); // Roman AE with acute accent
b.append('\u0391'); // GREEK Capital Alpha
b.append('\u03A9'); // GREEK Capital Omega
for (int i = 0; i < b.length(); i++) {
System.out.println("Character #" + i + " is " + b.charAt(i));
}
System.out.println("Accumulated characters are " + b);
came out to be junk as well:
Character #0 is a Character #1 is b Character #2 is c Character #3 is ¥ Character #4 is ? Character #5 is ? Character #6 is ? Accumulated characters are abc¥???
First, verify that the system property (file.encoding) is, in fact UTF8. If it is then your problem isn't the code you're running but your terminal program (or other output display) being unable to properly render the output.
write this to a file and check how it is coming. if it is coming properly in file then it is problem with your error stream ( Encoding is not UTF-8) . if there also it comes as junk character in ur server encoding may not be UTF-8.
Your second example produces the following output for me.
Character #0 is a Character #1 is b Character #2 is c Character #3 is ¥ Character #4 is Ǽ Character #5 is Α Character #6 is Ω Accumulated characters are abc¥ǼΑΩ
This code produces a correctly encoded UTF-8 file having the same content.
StringBuilder b = new StringBuilder();
for (char c = 'a'; c < 'd'; c++) {
b.append(c);
}
b.append('\u00a5'); // Japanese Yen symbol
b.append('\u01FC'); // Roman AE with acute accent
b.append('\u0391'); // GREEK Capital Alpha
b.append('\u03A9'); // GREEK Capital Omega
PrintStream out = new PrintStream("temp.txt", "UTF-8");
for (int i = 0; i < b.length(); i++) {
out.println("Character #" + i + " is " + b.charAt(i));
}
out.println("Accumulated characters are " + b);
See also: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
精彩评论