开发者

Python script to find word frequencies of a given document

I am looking for a simple script that can find frequencies of words for a given document (probably by using portable stemmer开发者_开发问答).

Is there any library or simple script that does this process?


use nltk

import nltk

YOUR_STRING = "Your words"

words = [w for w in YOUR_STRING.split()]
freq_dist = nltk.FreqDist(words)

tokens = freq_dist.keys()

#50 most frequent
most_frequent = tokens[:50]

#50 least frequent
least_frequent = tokens[-50:]


You should be able to count words. Use a collections.Counter or a dict, depending on what you need. That part is easy, but if it isn't you can find the answer by searching on SO itself.

I think you also want the Porter Stemmer, which has a Python version at http://tartarus.org/~martin/PorterStemmer/python.txt

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜