开发者

How to extract keywords from HTML page in C#?

Basically I want to extract keywords or words or tokens that are present in the we开发者_如何学Gobpage after removing the stopwords. Does anybody know how to do this? Code in C# would be appreciated.


Use an HTML parsing library like the HTML Agility Pack.

Once you load an HTML document with it, you can query it with Xpath syntax - it exposes the HTML in a similar way to an XmlDocument.


The HTML Agility Pack that Oded mentions will help you get at the plain text inside the HTML, but to extract keywords from the webpage after removing the stopwords you'll need to do more work. There's a good informative answer from Joseph Turian to this question: How do I extract keywords used in text?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜