Different behaviors between XmlDocument.LoadXml and XDocument.Parse
Our project has been converted to use XDocument from XmlDocument few days ago, but we found a strange behavior while processing XML entity in attribute value with XDocument.Parse, the sample code as following:
The XML string:
string xml = @"<char symbol=""�"">";
The XmlDocument.LoadXml code and result:
XmlDocument xmlDocument = new XmlDocument(); xmlDocument.LoadXml(xml); Console.WriteLine(xmlDocument.OuterXml);
Result:
<char symbol="�" />
The XDocument.Parse code and exception:
XDocument xDocument = XDocument.Parse(xml); Console.WriteLine(xDocument.ToString());
Except开发者_运维技巧ion:
A first chance exception of type 'System.Xml.XmlException' occurred in System.Xml.dll '.', hexadecimal value 0x00, is an invalid character. Line 1, position 18. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args) at System.Xml.XmlTextReaderImpl.Throw(Int32 pos, String res, String[] args) at System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32 startPos, Boolean expand, StringBuilder internalSubsetBuilder, Int32& charCount, EntityType& entityType) at System.Xml.XmlTextReaderImpl.ParseNumericCharRef(Boolean expand, StringBuilder internalSubsetBuilder, EntityType& entityType) at System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeValue, EntityExpandType expandType, Int32& charRefEndPos) at System.Xml.XmlTextReaderImpl.ParseAttributeValueSlow(Int32 curPos, Char quoteChar, NodeData attr) at System.Xml.XmlTextReaderImpl.ParseAttributes() at System.Xml.XmlTextReaderImpl.ParseElement() at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options) at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options) at System.Xml.Linq.XDocument.Parse(String text)
It seems that the "�" is an invalid character, so we change the value to a valid character such as "`" then both methods worked well.
Is there any way to change the XDocument.Parse behavior to ignore the invalid character in attribute like XmlDocument.LoadXml does?
According to this arctice the value is actually invalid. I've experienced myself that the XDocument class follows the XML standard much stricter than XmlDocument (which I think is a good thing).
Read the article, they give suggestions how to get around that error.
精彩评论