开发者

Alternatives to XDocument

Hey guys, XDocument is being very finicky with one of the xm开发者_如何学Gol feeds I have to parse, and keeps giving me the error

'=' is an unexpected token. The expected token is ';'. Line 1, position 576.

Which is basically XDocument crying about a loose "=" sign in the XML document.

I don't have any control over the source XML document, so I need to either get XDocument to ignore this error, or use some other class. Any ideas on either one?


If the document isn't well-formed XML (and my guess is that you have '&=' in the document or some other entity-looking string) then it's unlikely that any other XML parsers are going to be any happier with it. Have you tried loading the document in, say, IE to see if it parses there or pasted to an XML validator? You can also just try XmlDocument.Load() and see if it parses there, that's the next closest XML parser (aside from XmlReader which takes a little bit of setting up).


It won't make for good XML, but if you need to just load up a bad document then the HTML Agility Pack is a good tool. It can overlook many of the things that make HTML not XHTML and not XML-like, so your erroneous XML input will likely be parsed too. The object model it expresses is similar to XmlDocument. e.g.

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.xml");

 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Or you can use Agility Pack to clean up the XML and then feed its clean output to a real XML parser for further processing.

This is a quick and dirty trick that I've used for one-time tasks. It's not necessarily recommended over a proper solution.

What I would recommended if time permits is to somehow format/fix the erroneous XML content (e.g. maybe in its string form, or using another tool) before feeding it to an XML parser.


Take a look at the answers of this question: Parsing an XML/XHTML document but ignoring errors in C#

The best option I believe is to parse it in a try/catch block, remove the offending block inside the catch block, and re-parse.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜