开发者

Trying to create a 'trending words/phrases' engine but need to filter out common words

I'd like to parse strings coming into my system and keep a word count of each word in a separate table. Problem is many common words like 'the', 'at', etc will be included that shouldn't be. I would prefer not to create a dictionary by hand. Anyone know of a decent dictionary of common words I can m开发者_如何学JAVAatch against to not include? Thanks.


You're specifically referring to a "Stop words" list.

http://en.wikipedia.org/wiki/Stop_words

You can find one here

http://truereader.com/manuals/onix/stopwords1.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜