ValueError occurs when I try to use CG algorithm of MaxentClassifier in nltk

2023-02-27 18:39 问答作者：

When I tried the examples of MaxentClassifier from http://nltk.googlecode.com/svn/trunk/doc/howto/classify.html, I got the error below:

Grad eval #0

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    classifier = MaxentClassifier.train(train)
  File "C:\Python27\lib\site-packages\nltk\classify\maxent.py", line 323, in train
    gaussian_prior_sigma, **cutoffs)
  File "C:\Python27\lib\site-packages\nltk\classify\maxent.py", line 1456, in train_maxent_classifier_with_scipy
    model.fit(algorithm=algorithm)
  File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 1026, in fit
    return model.fit(self, self.K, algorithm)
  File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 226, in fit
    callback=callback)
  File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 636, in fmin_cg
    gfk = myfprime(x0)
  File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 176, in function_wrapper
    return function(x, *args)
  File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 420, in grad
    G = self.expectations() - self.K
ValueError: operands could not be broadcast together with shapes (54) (12)

Python Code:

train = [(dict(a=1,b=1,c=1), 'y'),
         (dict(a=1,b=1,c=1), 'x'),
         (dict(a=1,b=1,c=0), 'y'),
         (dict(a=0,b=1,c=1), 'x'),
         (dict(a=0,b=1,c=1), 'y'),开发者_如何学Python
         (dict(a=0,b=0,c=1), 'y'),
         (dict(a=0,b=1,c=0), 'x'),
         (dict(a=0,b=0,c=0), 'x')]
test = [(dict(a=1,b=0,c=1)), # unseen
        (dict(a=1,b=0,c=0)), # unseen
        (dict(a=0,b=1,c=1)), # seen 3 times, labels=y,y,x
        (dict(a=0,b=1,c=0)) # seen 1 time, label=x
        ]
classifier = MaxentClassifier.train(train)

But I don't how to solve it. Help me, thanks!

It works if you set the algorithm:

>>> algorithm = nltk.classify.MaxentClassifier.ALGORITHMS[0]
>>> algorithm
'GIS'
>>> classifier = nltk.MaxentClassifier.train(train, algorithm)

  ==> Training (100 iterations)

      Iteration    Log Likelihood    Accuracy
      ---------------------------------------
             1          -0.69315        0.556
             2          -0.65164        0.778
             3          -0.62713        0.778
             4          -0.61084        0.667
             5          -0.59935        0.667
             6          -0.59096        0.667
            .................................
            .................................

(Note you missed one line of the training corpus)

Edit: Several nltk algorithms fail, including 'CG'. The problem is probably the same as the one reported here. If this is the case, it probably will be solved in nltk next releases. You could also report a bug to nltk to help the developpers and yourself.

As the reported bug seems related with numpy broadcasting and outdated uses of numpy, maybe you could try with an older version of numpy

继续阅读：classification nltk python

ValueError occurs when I try to use CG algorithm of MaxentClassifier in nltk

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？