Python script to find word frequencies of a given document
I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer开发者_开发问答).
Is there any library or simple script that does this process?
use nltk
import nltk
YOUR_STRING = "Your words"
words = [w for w in YOUR_STRING.split()]
freq_dist = nltk.FreqDist(words)
tokens = freq_dist.keys()
#50 most frequent
most_frequent = tokens[:50]
#50 least frequent
least_frequent = tokens[-50:]
You should be able to count words. Use a collections.Counter
or a dict
, depending on what you need. That part is easy, but if it isn't you can find the answer by searching on SO itself.
I think you also want the Porter Stemmer, which has a Python version at http://tartarus.org/~martin/PorterStemmer/python.txt
精彩评论