Need to create a histogram in Python for a corpus
import nltk
from nltk.book import *
from nltk.corpus import brown
corpus_text = brown.words()
word_freq = FreqDist(corpus_text)
word_hist = 开发者_运维知识库dict()
for k,v in word_freq.iteritems():
if key in word_hist:
word_hist[v] = word_hist[v] + 1
else:
word_hist[v] = 1
print word_hist.viewkeys()
print word_hist.viewvalues()
I'm making a mistake at the dictionary handling here. Need to create a dictionary that has it's keys as the words from the freqdict and the values as the number of the corresponding word. how do I perform this increment?
I'm certain that
word_hist[v] = word_hist[v] + 1
else:
word_hist[v] = 1
has a bug.
Of course. It seems you are replacing the word_hist
dict with one of its values (plus 1). Try
word_hist[v] = word_hist[v] + 1
or even better
word_hist[v] += 1
instead.
EDIT: There is another bug:
for k,v in word_freq.iteritems():
if key in word_hist:
word_hist[v] = word_hist[v] + 1
else:
word_hist[v] = 1
makes no sense. key
is tested for presence in word_hist
, but then v
is used.
I don't know what key
is, but either use k
or v
for both.
from collections import defaultdict
word_hist = defaultdict(int)
for k,v in word_freq.iteritems():
word_hist[v] +=1
that definitely has a bug, but so does the previous line.
if key in word_hist:
word_hist[v] = word_hist[v] + 1
else:
word_hist[v] = 1
should be
if k in word_hist:
word_hist[k] = word_hist[k] + 1
else:
word_hist[k] = 1
you don't need to take v from the word_freq.
精彩评论