开发者

Open an HTML Document with xml.Load

I'd like to open an HTML document (as a string retrieved from a StreamReader, from the web), by creating a XMLDocument this way:

XmlDocument doc = new XmlDocument

doc.Load(string containing the retrieved document).

But since the HTML doc contains this head:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/开发者_如何学运维xhtml1/DTD/xhtml1-transitional.dtd" > 

It tells me that the document is invalid... Any way to workaround this?


Normal html, even if it's valid html, is not valid xml.

There is a library called HtmlAgilityPack which is a popular 3rd party open source library that you can use to solve this problem:

  • http://www.google.co.uk/search?q=htmlagilitypack
  • How to use HTML Agility pack


If you're positive that the HTML is valid XML, I imagine you could simply replace the HTML head with an XML one.


first you have to validate that the XHTML is a valid XHTML document (it means that is a valid XML document too).

paste your XHTML code here and review the output. http://validator.w3.org/#validate_by_input

good luck!.


One can use HTML Tidy Tidy.NET for this.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜