Parsing an XMl that contains "£"
I am parsing an xml document that is not well formed, it contains "&" in it, and the parsing is not correct it a node has "&" in it.
e.g. <abcnode>£70.00-£90.00</abcnode>
When I try to get this node's value it returned "70.00-".
I have no control over this xml so I will have to parse this malformed xml.
I am using XmlTextReader reader = new XmlTextReader(url);
to lo开发者_Go百科ad xml from url.
I can get the xml replace the £
to solve my problem, but this xml can be very large so I do not want to download the file to replace invalid characters (for performance reason).
Is there a way to parse this xml using XmlTextReader
?
XmlTextReader
will take a TextReader
argument to read from, so you might be able to implement a class that inherits TextReader
, override all the ReadXXX()
methods and repair the invalid characters in the overrides.
EDIT Alternatively you could hack the XML's DOCTYPE
as it is read to add <!ENTITY pound "£">
, which should make the rest of the document well-formed. There's probably another trick to add the entity to the XmlTextReader
itself without resorting to modifying the XML at all but I'm not aware of one.
I'm wondering whether your are right in saying that this isn't well-formed? Perhaps it's parsing correctly but constructing a DOM tree in which the entities appear explicitly as nodes, and your application code is ignoring the entity nodes?
精彩评论