Htmlnode collection and parsing

2022-12-08 04:51 问答作者：

I'm trying to extract the text contained in a webpage. So that I'm using a third pary tool Html Agility Pack. In that they mentioned:

HtmlWeb htmlWeb = new HtmlWeb();
HtmlDo开发者_运维百科cument doc = htmlWeb.Load("http://www.msn.com/");

HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[@href]");
foreach (HtmlNode link in links)
{
Response.Write(link.Attributes["href"].Value + "<br>");
}

It is working for me to grab all other links contained in a page. But I want to get all the text data contained in that page. Is it possible?

Yep, it's possible. Download the source code for the HtmlAgilityPack and take a look at the Html2Txt sample project, particularly HtmlConvert.cs. You can pretty much copy/paste their method into whatever it is you're doing.

Or, for that matter, compile the sample project as-is and set a reference to the binaries. HtmlAgilityPack.Samples.HtmlToText.Convert() will do exactly what you need.

you are using an xpath selector there. If you select all nodes ("*") and then perform the foreach would it work?

PS: what programming language is this?

继续阅读：html-agility-pack parsing

Htmlnode collection and parsing

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？