Html string reader
I need to load HTML and parse it, I think that it should be something simple, I pass a st开发者_StackOverflowring with a "HTML" it reads the string in a Dom like object, so I can search and parse the content of the HTML, facilitating scraping and things like that.
Do you guys know about any thing like that.
Thanks
HTML Agility Pack
Similar API to XmlDocument
, for example (from the examples page):
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
(you should also be able to use LoadHtml
to load a string of html, rather than from a path)
If you're running in-browser, you should be able to use the Html DOM Bridge, load the HTML into it, and walk the DOM Tree for that.
精彩评论