开发者

In C#.net, how can I parse HTML?

I have a webbrowser control. I navigate it to some address. When it loaded i want to pick only urls from inside this codes. Is it possible to handle the html like xml? If it is possible i can use othe DOM properties too. Any xml like ingredient container object to pass the html in开发者_StackOverflow社区to it? Thank you.


Sounds like you need to use the HTML agility pack

Also see this other stack overflow question:

C# Is there a LINQ to HTML, or some other good .Net HTML manipulation API?


Yes, you can use MSHTML to navigate the DOM. You would need to add a reference to Microsoft.mshtml in your project. An example of using it to get all links in a document would be:

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    HtmlDocument doc = webBrowser1.Document;

    foreach (HtmlElement element in doc.Links)
    {
        HTMLAnchorElement link = (HTMLAnchorElement) element.DomElement;
        Debug.WriteLine(link.href);
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜