In C#.net, how can I parse HTML?
I have a webbrowser control. I navigate it to some address. When it loaded i want to pick only urls from inside this codes. Is it possible to handle the html like xml? If it is possible i can use othe DOM properties too. Any xml like ingredient container object to pass the html in开发者_StackOverflow社区to it? Thank you.
Sounds like you need to use the HTML agility pack
Also see this other stack overflow question:
C# Is there a LINQ to HTML, or some other good .Net HTML manipulation API?
Yes, you can use MSHTML to navigate the DOM. You would need to add a reference to Microsoft.mshtml
in your project. An example of using it to get all links in a document would be:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlDocument doc = webBrowser1.Document;
foreach (HtmlElement element in doc.Links)
{
HTMLAnchorElement link = (HTMLAnchorElement) element.DomElement;
Debug.WriteLine(link.href);
}
}
精彩评论