开发者

".SO" file for Tesseract OCR

I need to use the ".so" file of Tesseract OCR (Optical Character Recognition) for my Android app. Can anyone explain me how to get the ".so" file for Tesseract OCR?

I tried to import the complete project which is not worki开发者_Python百科ng.


The README that comes with tesseract-android-tools is going to be your best friend. I used Ubuntu 11.04 in VirtualBox. Within Ubuntu (I assume you'll want to transfer these to Windows later):

1) Download the Android NDK

2) SVN the tesseract-android-tools project. I used tesseract-android-tools-read-only/tesseract-android-tools/ as the $PROJECT directory, FYI.

3) Use ndk-build (detailed in README) to build tesseract. Doing so will create a libs folder within $PROJECT, and the three .so files you need.

I believe there's a way to do it with Cygwin but I'm not sure how, as I already had the VM ready to use.

From there (using instructions at http://code.google.com/p/tesseract-android-tools/updates/list):

4) tesseract tools is actually a library and has a eclipse .project so just import that project after building the so's with ndk and build it.

5) Set it as lib : http://developer.android.com/guide/developing/projects/projects-eclipse.html#SettingUpLibraryProject

6) Now in the same workspace create a new android project i.e. your app. Go to properties and reference the library from step 3 ( http://developer.android.com/guide/developing/projects/projects-eclipse.html#ReferencingLibraryProject )

7) Build your app based on Android 2.2 (min) : http://code.google.com/p/tesseract-android-tools/issues/detail?id=5#c16

And it should work !

Note that you must be using Android 2.2 or higher. Hope that helps!!


@raju : I was having the same problem as yours. After searchig for solutions I found this: http://gaut.am/making-an-ocr-android-app-using-tesseract/

Dunno if your case is like mine!?! But I'm developing using Eclipse under Windows OS. The Blog (previous link) says that this can't be done under windows, so you must use Linux (ex: Ubuntu within a Virtual Machine for example). In addition, the blog explains in details the steps that should be done.

@jmiles I've tried to do "ndk-build" under Ubuntu and then I transferred the result into Windows. I've built the tesseract and made it a "Library"; however, I'm always having errors when trying to recognize characters: These are some of the log messages: 04-04 14:32:28.569: E/2130968577(561): java.lang.IllegalArgumentException: Data path must contain subfolder tessdata! 04-04 14:32:28.569: E/2130968577(561): at com.googlecode.tesseract.android.TessBaseAPI.init(TessBaseAPI.java:167)

Do you @jmiles or @CommonsWare have any idea about the problem??


You would need to install the Android NDK, convert the Tesseract stuff into an NDK extension, and add it to your Java app via JNI. This is unlikely to be easy. You cannot just take an .so for, say, Linux and put it in your project.


You need to download the language data file for the tesseract and put it into the 'tessdata' folder and then initialize tesseract with it like

TessBaseAPI.init("your language file tessdata folder","language mostly 'eng'");

This will work now.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜