开发者

How do I remove unrecognized characters which come back from a web service?

I am working on an app which calls a rest web service. Sometimes the xml responses contain characters which the phone can not display. When displaying these characters, an empty box is di开发者_开发问答splayed instead. I would like to filter out these characters. How can I detect if a character will be able to be displayed on the screen?

Some specific characters include:

http://www.fileformat.info/info/unicode/char/0094/index.htm http://www.fileformat.info/info/unicode/char/0080/index.htm http://www.fileformat.info/info/unicode/char/0092/index.htm


Android support the following encodings

  • Xml.Encoding ISO_8859_1
  • Xml.Encoding US_ASCII
  • Xml.Encoding UTF_16
  • Xml.Encoding UTF_8

US_ASCII shouldn't make any problems.

For ISO_8859_1 you should check wiki for control chars 0x00-0x1f and 0x7f-0x9f and filter them. And of course use a matching font.

Using UTF_8 or 16 is more complex, read Joels The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

You might find this mailing list useful.


First of all, try to get the default charset of your device with:

Charset.defaultCharset();

Then try to get the charset of your XML looking in pseudo-attribute of the XML declaration or in Content-Type header of the HTTP call.
For example:

<?xml version="1.0" encoding="utf-8" ?>

or

Content-Type: text/html; charset=utf-8

If the default charset of your device is different from the charset of XML, you have to pay attention when you handle new strings with:

new String( bytes);

because if you forget to specify the correct encoding, Dalvik will use the default encoding of the device with plausible display errors.
Remember to use:

new String( bytes, encoding);


It appears I can call Character.isIdentifierIgnorable() on each character and not include it if it is ignorable. Character.isISOControl() will probably also work.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜