Fraktur recognition with OCRopus/Tesseract on Linux
I am trying to perform recognition of a german text with fraktur typeface with ocropus but It doesn't seem to be using deu-f package.
Here are the steps I performed.
- Compiled and installed tesseract and ocropus.
- Downloaded http://tesseract-ocr.googlecode.com/files/tesseract-2.01.deu-f.开发者_JS百科tar.gz, unpacked it to tessdata/.
But when I call
$ ocroscript recognize --tessLanguage=deu-f --output-mode=text image.png
the results are the same as when I call
$ ocroscript recognize --tessLanguage=eng --output-mode=text image.png
Any ideas what the problem is?
The problem is described in http://code.google.com/p/ocropus/issues/detail?id=87. Just need to apply the patch to ocropus and rebuild it.
精彩评论