开发者

Is tesseract 3.00 multi-threaded?

I read some other posts suggesting that they would add multi-threading support in 3.00. But I'm not sure if it's added in 3.00 when it was released.

Other 开发者_开发知识库than multi-threading, is running multiple processes of tesseract a feasible option to achieve concurrency?

Thanks.


One thing I've done is invoked GNU Parallel to run as many instances of Tess* as able on a multi-core system for multi-page documents converted to single page images.

It's a short program, easily compiled on most Linux distros (I'm using OpenSuSE 11.4).

Here's the command line that I use:

/usr/local/bin/parallel -j 4 \
   /usr/local/bin/tesseract -psm 1 -l eng {} {.} \
   ::: /tmp/tmp/*.jpg

The -j 4 tells parallel to use all four CPU cores that I have on a server.

If you run this, and in another terminal do a 'top,' you'll see up to four processes at one time until it rummages through all of the JPG's in the directory specified.

Your load should never exceed the number of CPU cores in your system (if you run Linux).

Here's the link to GNU Parallel:

http://www.gnu.org/software/parallel/


No. You can browse the code in http://code.google.com/p/tesseract-ocr/source/browse/ None of the current code in trunk seems to make use of multi-threading. (at least looking through the base classes, api, and neural networking classes)


I did use parallel as well, on a Centos, this way:

ls | parallel --gnu "tesseract {} {.}"

I used the --gnu option as suggested from the stdout log which was:

parallel: Warning: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu.

the {} and {.} are placeholders for parallel: in this case you're telling tesseract to use the file listed as first argument, and the same file name without extension as second argument - everything is well explained in parallel man pages.

Now, if you have - say - three .tif files and you run tesseract three times, one for each file, summing up the execution time, and then you run the command above with time before parallel, you can easily check the speedup.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜