Invalid XML characters
I have a text file(UTF-8) file. Content of this file is extracted form rich text documents, it might be MS Word, PDF, HTML or any thing. I have to pass this content to a web service, but most of time it contain invalid characters like form feed or null. What happens now is when I pass the content of the file, containing invalid character, to the web service it throw exception (not a valid XML character).
As I found few characters that are not valid f开发者_运维知识库or XML but can I have a proper .NET function the clean the string and remove all invalid characters or can I have a list of Invalid characters for any authentic site.
Thanks for your help in advance.
http://java.net/jira/browse/JAXB-614
This link will help you for the set. The set of invalid XML characters are: '\u0000', '\u0001', '\u0002', '\u0003', '\u0004', '\u0005', '\u0006', '\u0007', '\u0008', '\u000B', '\u000C', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012', '\u0013', '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', '\u001A', '\u001B', '\u001C', '\u001D', '\u001E', '\u001F', '\uFFFE', '\uFFFF'
If it's important to send a file's content without any modification the best decision is to escape the content. If it's not, try to use XmlConvert.IsXmlChar method, it helps to check a character's correctness. Check this my answer for code samples.
Probably the best way is to encode the whole text in Base64 as example.
http://en.wikipedia.org/wiki/Base64
Regards,
精彩评论