开发者

Java Stanford NLP: ArrayIndexOutOfBounds after loading second lexicon

I am using the Stanford Natural Language processing toolkit. I've been trying to find spelling errors with Lexicon's isKnown method, but it produces quite a few false positives. So I thought I'd load a second lexicon, and check that too. However, that causes a problem.

private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);

    static {
        lp.setOptionFlags(Constants.lexOptionFlags);        
        wsjLexParse.setOptionFlags(Constants.lexOptionFlags);       
    }

public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
    initialInput = input;
    DocumentPreprocessor process = new DocumentPreprocessor();
    sentences = process.getSentencesFromText(new StringReader(input));

    for (List<? extends HasWord> sent : sentences) {
        if(lp.parse(sent)) { // line 65
            forest.add(lp.getBestParse()); //non determinism?
        }
    }

    partsOfSpeech = pos();
    runAnalysis();
}

The following fail trace is produced:

java.lang.ArrayIndexOutOfBoundsException: 45547
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
    at nth.compling.ParseTree.<init>(ParseTree.java:65)
    at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
    at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
    at org.junit.internal.r开发者_如何学编程unners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
    at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

If I comment out this line: (and other references to wsjLexParse)

private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);

then everything works fine. What am I doing wrong here?


Looks like a bug in the Stanford library. You should report it to them.

Does the second lexicon work when you load only it (and not the other one)? Does the same error occur when you load the two lexica in different order?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜