Sphinx 4 corrupted ARPA LM?
I have an ARPA LM generated by kylm, when running SPHINX I get this exception stack trace:
Exception in thread "main" java.lang.RuntimeException: Allocation of search manager resources failed
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:242)
at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:87)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:168)
at transcribing.Main.main(Main.java:78)
Caused by: java.io.IOException: Corrupt Language Model file:./corpus.arpa at line 2420:Premature EOF
at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.corrupt(SimpleNGramModel.java:458)
at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.readLine(SimpleNGramModel.java:404)
at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.load(SimpleNGramModel.java:307)
at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.allocate(SimpleNGramModel.java:110)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:342)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:238)
... 3 more
Java Result: 1
Here's an excerpt of the ARPA LM:
[n]
3
[smoother]
kylm.model.ngram.smoother.KNSmoother
[closed]
true
[max_length]
1091
[vocab_cutoff]
0
[start_symbol]
<s>
[terminal_symbol]
</s>
[unknown_symbol]
<unk>
\data\
ngram 1=406
ngram 2=768
ngram 3=937
\1-grams:
-99.0000 <s> -0.3630
...
...
\end\
PS: there is a new line after \end\
The exeption 开发者_如何学编程says that SPHINX is encountering an unexpected EOF on the last line (isn't it supposed to encounter an EOF there ??)
Please any help !
It turns out to be a SPHINX 4 bug.
If the \1-grams:
directive (or any other directive actually) contained tailing space[s], SimpleNGramModel
failed to parse it !
I just submitted the patch, but you can find it here.
精彩评论