开发者

Is there a list of most common english words for indexing text for search?

Is there a free available list of the most common english words to remove from text for creating a开发者_如何学C search index?


Wikipedia gives the 100 most frequent lemmas: http://en.wikipedia.org/wiki/Most_common_words_in_English

That might be good for a start; the article provides some good references.


Here are the ones (plus characters) used in SQL Server 05 noiseword list, i assume the 08 stopwords are simular.

And the MSDN on it here

Hope this helps

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜