开发者

Discard unprintable characters returned in server's XML response

While trying to use the Bing API to search, I am getting characters that are not printable and do not seem to hold any extra information. The goal is to save the XML (UTF-8) response as a text file to be parsed later.

My code currently looks something like this:

    URL url = new URL(queryURL);

    BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
    BufferedWriter out = new BufferedWriter(new FileWriter(query+"-"+saveResultAs));
    String str = in.readLine();
    out.write(str);

    in.close();
    out.close();

When I send the contents of 'str' to console it looks something like this:

Discard unprintable characters returned in server's XML response

and here's a开发者_如何学JAVA what the newly created local XML file looks like:

Discard unprintable characters returned in server's XML response

What should I be doing to convert the UTF-8 text so that str does not have the extra characters?


If you know upfront the encoding you should

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));

And the same with the writer... in your example after writing your file is encoded in platform default, while still declaring to be UTF-8.

It may be wise to read the encoding from the XML declaration to avoid surprises.

If you only want to store the data for later use there's no use to encode/decode anyway. Just read the bytes and write them away. Keep the task of detecting encoding for the XML parser..


The XML parser will handle encoding/decoding, and the appropriate characters will be fed back to you (e.g. a SAX parser will do this via the characters() method callback). All you need to do is then store that in a suitable file (perhaps with a suitable Byte-Order-Mark?)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜