Is tesseract 3.00 multi-threaded?
I read some other posts suggesting that they would add multi-threading support in 3.00. But I'm not sure if it's added in 3.00 when it was released.
Other 开发者_开发知识库than multi-threading, is running multiple processes of tesseract a feasible option to achieve concurrency?
Thanks.
One thing I've done is invoked GNU Parallel to run as many instances of Tess* as able on a multi-core system for multi-page documents converted to single page images.
It's a short program, easily compiled on most Linux distros (I'm using OpenSuSE 11.4).
Here's the command line that I use:
/usr/local/bin/parallel -j 4 \
/usr/local/bin/tesseract -psm 1 -l eng {} {.} \
::: /tmp/tmp/*.jpg
The -j 4 tells parallel to use all four CPU cores that I have on a server.
If you run this, and in another terminal do a 'top,' you'll see up to four processes at one time until it rummages through all of the JPG's in the directory specified.
Your load should never exceed the number of CPU cores in your system (if you run Linux).
Here's the link to GNU Parallel:
http://www.gnu.org/software/parallel/
No. You can browse the code in http://code.google.com/p/tesseract-ocr/source/browse/ None of the current code in trunk seems to make use of multi-threading. (at least looking through the base classes, api, and neural networking classes)
I did use parallel
as well, on a Centos, this way:
ls | parallel --gnu "tesseract {} {.}"
I used the --gnu
option as suggested from the stdout log which was:
parallel: Warning: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu.
the {}
and {.}
are placeholders for parallel: in this case you're telling tesseract to use the file listed as first argument, and the same file name without extension as second argument - everything is well explained in parallel man pages.
Now, if you have - say - three .tif
files and you run tesseract
three times, one for each file, summing up the execution time, and then you run the command above with time
before parallel
, you can easily check the speedup.
精彩评论