Discard unprintable characters returned in server's XML response
While trying to use the Bing API to search, I am getting characters that are not printable and do not seem to hold any extra information. The goal is to save the XML (UTF-8) response as a text file to be parsed later.
My code currently looks something like this:
URL url = new URL(queryURL);
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
BufferedWriter out = new BufferedWriter(new FileWriter(query+"-"+saveResultAs));
String str = in.readLine();
out.write(str);
in.close();
out.close();
When I send the contents of 'str' to console it looks something like this:
and here's a开发者_如何学JAVA what the newly created local XML file looks like:
What should I be doing to convert the UTF-8 text so that str does not have the extra characters?
If you know upfront the encoding you should
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
And the same with the writer... in your example after writing your file is encoded in platform default, while still declaring to be UTF-8.
It may be wise to read the encoding from the XML declaration to avoid surprises.
If you only want to store the data for later use there's no use to encode/decode anyway. Just read the bytes and write them away. Keep the task of detecting encoding for the XML parser..
The XML parser will handle encoding/decoding, and the appropriate characters will be fed back to you (e.g. a SAX parser will do this via the characters()
method callback). All you need to do is then store that in a suitable file (perhaps with a suitable Byte-Order-Mark?)
精彩评论