开发者

Google translate v2 api returning non UTF-8 characters

I am trying to use the Google Translate v2 api in my app engine project. However, for accented characters, its encoding is messed up [case in point being the word "student", which should be "étudiants" in French, becomes "étudiants"]. Here is my code.

    URL url = new URL(
            "https://www.googleapis.com/language/translate/v2?key=" + KEY
                    + "&q=" + urlEncodedText + "&source=en&target="
                    + urlEncodedLang);
    try {
        InputStream googleStream = url.openStream();

        // make a new bufferred reader, by reading the page at the URL given
        // above
        BufferedReader reader = new BufferedReader(new InputStreamReader(
                googleStream));

        // temp string that holds text line by line
        String line;

        // read the contents of the reader/the page by line, until there are
        // no lines left
        while ((line = reader.readLine()) != null) {
            // keep adding each line to totalText
            totalText = totalText + line + "\n";
        }
        // remember to always close the reader
        re开发者_Go百科ader.close();

    } catch (Exception ex) {
        ex.printStackTrace();
    }

typing the same URL in a browser (Chrome on Ubuntu) works fine, and returns JSON response containing the properly accented characters.

What am I missing here? Thanks


To make sure that it has UTF-8 encoding, you have to use:

BufferedReader reader = new BufferedReader(new InputStreamReader(googleStream, "UTF-8"));

in other case it's using an default encoding, probably it's a ISO-8859-1.


You can also try using Google Translate API v2 for Java that does it for you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜