开发者

how do you serialize HTML in C#?

how do you serialize HTML in C#?

I think I know how to use XSD.exe to create C# classes from XML that can be used with the XmlSerializer class to serialize and verify the XML document.

Is there a way to do the same sort of thing with an HTML document? I have tried but the xsd command line says that the remote name www.w3.org cannot be resolved.

At a minimum, i开发者_如何学Cs there a way to use C# to find out if an HTML file is valid?


The HTMLAgilityPack is an open source library that parses HTML easily for you. You can then search/manipulate the structure of the document quite easily.

It's quite forgiving with the HTML you provide it, so I'm not sure if it's a good way of checking that if you've got a strict xHTML valid document. But it should be able to parse anything a modern browser can.


If it's XHTML that you're trying to validate, you can do it like this:

static void validate(string filename)
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.ProhibitDtd = false;
    settings.ValidationType = ValidationType.DTD;
    settings.ValidationEventHandler +=
        new ValidationEventHandler(ValidationCallBack);
    settings.XmlResolver = new XhtmlUrlResolver();

    // Create the XmlReader object.
    XmlReader reader = XmlReader.Create(filename, settings);

    // Parse the file. 
    while (reader.Read()) ;
}

// Display any validation errors.
private static void ValidationCallBack(object sender, ValidationEventArgs e)
{
    Console.WriteLine("Validation Error: {0}", e.Message);
}

It will be a bit slow because it's downloading the schema files from the W3C web site.


To deserialize/parse HTML, I would also recommend HTMLAgilityPack. However, to validate the HTML, you could try running HTML Tidy. For XHTML, however, you can obtain an XSD.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜