开发者

How to convert from ISO-8859-1 to UTF-8 a webpage in java/groovy

I want to read a webpage A in ISO-8859-1 charset, according to the browser, and return the content in UTF-8 as a content of the webpage B.

This is: I want to show the content of the page A in the same charset that I use to show the rest 开发者_开发知识库of the page B, that is UTF-8.

How do I do this in java/groovy?

thanks in advance


In Groovy you could write something like this:

def source = new URL("http://www.google.com").getText("ISO-8859-1")
def target = new String(source.getBytes("UTF-8"), "UTF-8")


You don't say what stack you're building on or how you're accessing the content, but the general mechanism for such a transcoding operation is to use UTF-16 as an intermediary; that is, convert ISO-8859-1 bytes to UTF-16 chars to UTF-8 bytes.

You could use InputStreamReader (with the an ISO-8859-1 Charset), then write bytes via OutputStreamWriter (with a UTF-8 Charset).

Some APIs provide encoding operations as part of their I/O classes (e.g. ServletResponse.getWriter()).

I'm ignoring any need to parse and transform the data, which is a whole other can of worms.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜