开发者

XPath to parse invalid HTML in .NET 4.0?

Is it possible to use XPath with .NET, w开发者_运维知识库ithout using any external libraries? Is it natively supported, and can it parse "invalid HTML" (such as tags not being closed etc)?

I would really hate to have to use regular expressions for this, as clearly stated here: RegEx match open tags except XHTML self-contained tags

I've also had bad experiences with regular expressions when it comes to HTML parsing.


Yes, XPath is natively supported. No, it will not parse tag soup. You'll probably want to use the HTML Agility Pack for that instead.


XPath has been supported in .NET since day 1. However, it only supports well-formed XML. Not all valid HTML is well-formed XML, and unclosed tags are not well-formed XML.


Yes. See System.Xml.XPath.XPathExpression. It lives in System.Xml.dll, which is included on any machine that has the .NET Framework installed.

Not sure about the unclosed HTML tags question. A small experiment should answer that pretty quickly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜