What are the options for embedded/scriptable OCR engine? [closed]
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this questionI am working on a Python/django web application and I need to extract text from scanned documents (for search indexing).
What options are there for OCR engines? I know of tesseract, but I am not entirely satisfied with the results. The problem could perhaps be solved by more extensive pre-processing (rotation, level adjustment, etc.).
Requirements:
- Should not require manual tuning (other than initial tuning)
- Preferably open source, alternatively should be possible to buy "liberal" license
- Either Python module, or command-line program (or C-library that I can turn into a command-line program :) )
Alternatively:
- A good library that does image pre-processing so that an existing engine开发者_Python百科 like tesseract will perform better.
Tesseract itself can be optionally made to compile with Leptonica, a library with a pretty exhaustive set of image manipulation (I'm not sure if Tesseract itself uses it for anything more than supporting more than just the basic TIF format). A thorough list of features can be found on the website. The project author, Dan Bloomberg, has written a few papers on image preprocessing for OCR, which too might be of interest to you -- you could find them by doing a site: http://www.leptonica.com/papers/
Google search.
精彩评论