开发者

illegal character in xml document

I have a program that is generating Xml Files from data out of a database. In short code it does the following:

string dsn = "a db connection string";
XmlDocument d = new XmlDocument();
using (SqlConnection con = new SqlConnection(dsn)) {
    con.Open();
    string sql = "select id as Id, comment as Comment from Test where ... ";
    using (SqlCommand cmd = new SqlCommand(sql, con)) {
        DataSet ds = new DataSet("EXPORT");
        SqlDataAdapter da = new SqlDataAdapter(cmd);
        da.Fill(ds, "Test");
        d.LoadXml(ds.GetXml());
    }
}
d.Save(@"c:\test.xml");

When I have a look at the xml file it contains the invalid character & # x 1 A ;

<EXPORT>
  <Test>
    <Id>2</Id>
    <Comment> Keyboard NB&#x1A;5 linked</Comment>
  </Test>
</EXPORT>

This xml file cannot be opened by firefox browser saying invalid character ...

That Entity is reserved in ISO 8859-1 and CP1252 and should not be rendered by browsers. But why does XmlDocument output xml that cannot be parsed as valid - or is it a valid xml document tha开发者_C百科t just cannot be parsed by Browsers or imported by Excel and so on ... Is there a easy way of getting rid of that reserved 'invalid characters' or encoding them in a way that Browsers do not have a Problem with it?

Many thanks for your opinion and tipps


Not all characters are representable in XML.

In XML 1.0, none of the characters with values less than 0x20 can be used, except for TAB (0x09), LF (0x0A) and CR (0x0D).

In XML 1.1, just about anything except NUL (0x00) can be used.

If you have the option to use XML 1.1, and the receiving program supports XML 1.1 (not many do), then you can escape the 0x1A as &#26; or &#x1A;.

Wrapping it in CDATA is not a solution either; CDATA is just a convenience for escaping groups of characters differently than the standard &-mechanism.

Otherwise, you will need to remove it prior to serializing.


I've run into this a few times when creating/manipulating XML from SQL data.

But why does XmlDocument output xml that cannot be parsed as valid - or is it a valid xml document that just cannot be parsed by Browsers or imported by Excel and so on

The XmlDocument doesn't perform any validation on the data that you send it, it leaves that to you (the developer). This XML document should be invalid in almost every thing that uses XML (but I could be wrong about that ... you could always test it :P)

Almost every time I've hit this problem, I ended up using replacing the offending XML data with either the proper character (if it has one) or just getting rid of it.

You could also try putting your xml inside a CData block, but that will bloat the file a tiny bit (not sure how big overall your file will be)


Take a look to this xml parse error on illegal character

Conclusion (as I understood it): With XML 1.0 it is impossible to store this value.


Have a look at this answer to see if it helps:

.NET DataSet.GetXml() - what's the default encoding?


I'd think you're processing a Control-Z (end of text file) character. Is this possible?


Make sure to escape XML entities, eg. & => &amp; Otherwise, wrap the data in CDATA http://en.wikipedia.org/wiki/CDATA

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜