Tesseract Appears to be learning characters as you perform more OCRs, how do I save the learning data between uses?
I have a particular set of 10 images to perform OCRs. They are all digits; somewhat short, about 20 digits in each image. There is one particular image, if I run it first, it will have some mismatches; however, if I run other tests first, then come back to that one, all characters match.
I am inclined to conclude that Tesseract is learning the characters as more OCR operations are performed, which makes me very happy. Now the question is, if it's possible, for me to save the learning data, so Tesseract wo开发者_如何学Culd know to pick it up the next time I use it?
You can set classify_save_adapted_templates to 1 in your Tesseract config file to save the adapted templates and set classify_use_pre_adapted_templates to 1 to load the templates next time you run Tesseract
The code that specifies the behavior of these options is here: http://code.google.com/p/tesseract-ocr/source/browse/trunk/classify/classify.cpp?r=570
精彩评论