开发者

Convert string from UTF-8 to ISO 8859-1 in Java

I want to encode a UTF-8 string to a ISO 8859- string in Java

I have this:

String title = new String(item.getTitle().getText().getBytes("ISO-8859-1"));

But it isn't working, the output is Sørensen f开发者_如何学JAVAor example


There's no such thing as a "UTF-8 string" in Java... there are just strings, which are always in Unicode. (They're effectively always UTF-16.)

You can have a byte array which is an ISO-8859-1 encoded form of a string (or UTF-8 or whatever) but it doesn't make sense to have a string with an encoding.

If you've read a string with the incorrect encoding somewhere, the correct thing to do is fix the code which reads the string, rather than trying to decode/encode the data from the string form later.

If you could give more information about the problem, we can probably give some more useful advice.


This problem isn't to be solved that way. Strings in Java are always in the same encoding (UTF-16), you've basically only changed the content. You need to set the encoding in the destination of this string. If it's the stdout, you need to set its encoding. If it's a file, you need to set its Writer encoding. If it's a HTML page, you need to set the response encoding. If it's a database, you need to set the DB/table/connection encoding. Etcetera.

Update: as per the comments:

The string is from a RSS feed that is in UTF-8, and I want to show in in a HTML page that uses ISO 8859 encoding

You'll need to upgrade the HTML page's encoding from vintage ISO 8859 encoding to the modern and world-domination-prepared UTF-8 encoding.

Update 2: as per the comments:

Firefox shows the it in the right encoding by default (utf-8) but Internet Explorer for example doesn't

Then the text is actually fine. You don't need to massage the string into another encoding. The symptoms tells that the character encoding information is missing in the response headers. Firefox has actually a pretty smart encoding detector, while IE will use the platform default encoding when the encoding is unknown. But IE will also fail if the HTML is (drastically) malformed in doctype and head.

Thus, either the HTML response is syntactically invalid, or the response content type wasn't set correctly. Assuming that your website validates and that you're using JSP/Servlet (after judging your post history here), you basically need to add the following line to the top of your JSP:

<%@ page pageEncoding="UTF-8" %>

That's all. It will automatically set both the response encoding (so that the server knows which encoding to use to write the characters to the byte stream of the response) and the encoding in the Content-Type response header (so that the client knows which encoding to use to read/display those characters from the byte stream of the response). For more background information you may find this article useful.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜