String received with utf8 format doesn't get displayed correctly
I want to know how to receive the string from a file in Java which has different lang开发者_Python百科uage letters.
I used UTF-8
format. This can receive some language letters correctly, but Latin letters can't be displayed correctly.
So, how can I receive all language letters?
Alternatively, is there any other format which will allow me to receive all language letters.
Here's my code:
URL url = new URL("http://google.cm");
URLConnection urlc = url.openConnection();
BufferedReader buffer = new BufferedReader(new InputStreamReader(urlc.getInputStream(), "UTF-8"));
StringBuilder builder = new StringBuilder();
int byteRead;
while ((byteRead = buffer.read()) != -1)
{
builder.append((char) byteRead);
}
buffer.close();
text=builder.toString();
If I display the "text", the letters can't be displayed correctly.
Reading a UTF-8 file is fairly simple in Java:
Reader r = new InputStreamReader(new FileInputStream(filename), "UTF-8");
If that isn't working, the issue lies elsewhere.
EDIT: According to iconv, Google Cameroon is serving invalid UTF-8. It seems to actually be iso-8859-1.
EDIT2: Actually, I was wrong. It serves (and declares) valid UTF-8 if the user agent contains "Mozilla/5.0" (or higher), but valid iso-8859-1 in (some) other cases. Obviously, the best bet is to use getContentType to check before decoding.
精彩评论