Trying to create a 'trending words/phrases' engine but need to filter out common words
I'd like to parse strings coming into my system and keep a word count of each word in a separate table. Problem is many common words like 'the', 'at', etc will be included that shouldn't be. I would prefer not to create a dictionary by hand. Anyone know of a decent dictionary of common words I can m开发者_如何学JAVAatch against to not include? Thanks.
You're specifically referring to a "Stop words" list.
http://en.wikipedia.org/wiki/Stop_words
You can find one here
http://truereader.com/manuals/onix/stopwords1.html
精彩评论