开发者

Need to create a histogram in Python for a corpus

import nltk
from nltk.book import *
from nltk.corpus import brown
corpus_text = brown.words()
word_freq = FreqDist(corpus_text)
word_hist = 开发者_运维知识库dict()

for k,v in word_freq.iteritems():
   if key in word_hist:
      word_hist[v] = word_hist[v] + 1
   else:
      word_hist[v] = 1 

print word_hist.viewkeys()
print word_hist.viewvalues()

I'm making a mistake at the dictionary handling here. Need to create a dictionary that has it's keys as the words from the freqdict and the values as the number of the corresponding word. how do I perform this increment?

I'm certain that

      word_hist[v] = word_hist[v] + 1
   else:
      word_hist[v] = 1

has a bug.


Of course. It seems you are replacing the word_hist dict with one of its values (plus 1). Try

word_hist[v] = word_hist[v] + 1

or even better

word_hist[v] += 1

instead.

EDIT: There is another bug:

for k,v in word_freq.iteritems():
   if key in word_hist:
      word_hist[v] = word_hist[v] + 1
   else:
      word_hist[v] = 1

makes no sense. key is tested for presence in word_hist, but then v is used.

I don't know what key is, but either use k or v for both.


from collections import defaultdict
word_hist = defaultdict(int)

for k,v in word_freq.iteritems():
    word_hist[v] +=1


that definitely has a bug, but so does the previous line.

if key in word_hist:
      word_hist[v] = word_hist[v] + 1
   else:
      word_hist[v] = 1 

should be

if k in word_hist:
    word_hist[k] = word_hist[k] + 1
else:
    word_hist[k] = 1
 

you don't need to take v from the word_freq.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜