开发者

Parse page HTML output

I'd like to know one (or more) ways to parse the HTML page output. I'd like to detect some patterns on the HTML that will be send to the client and log some info if开发者_如何转开发 present.


Everything you need is in the

   Page.Render 

method, override it and do what you want to in there.

protected override void Render(HtmlTextWriter writer)
{
    // do your stuff here
     StringBuilder  stringBuilder = new StringBuilder();
     StringWriter   stringWriter = new StringWriter(stringBuilder); 
     HtmlTextWriter htmlTextWriter = new HtmlTextWriter(stringWriter);

     base.Render(htmlTextWriter); // <-- render the page into the htmlTextwriter
     // the htmlTextwriter connects trough the stringWriter to the stringBuilder 
     string theHtml = stringBuilder.ToString(); // <---- html captured in string
     //---------------------------------------------
     //do stuff on theHtml here
     //---------------------------------------------
     writer.Write(theHtml); // <----write html with the original writer
}


It depends on what you mean by "parse" exactly, but something like the HTML Agility Pack can create an XML-like structure from an HTML document - essentially creating a proper HTML DOM data structure. You can even then convert it straight to XML, use LINQ, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜