开发者

Recognizable numbers using PHP

I’m trying to extract some numbers ranging from 1-99 from a picture. I’ve tried several OCR methods using PHP, but eventually my script will fail, since the numbers occasionally is rotated 5% to the left or right. This making the picture not being recognizable.

I’ve now installed Ocropus http://code.google.com/p/ocropus/ as a test. Unfortunately this is not giving me the correct numbers every time. This leads me to think that my pictures are not optimized enough.

Does anyone have some tips/ideas how to optimize the readability 开发者_开发百科of the numbers? I would also be grateful for ideas how to find the numbers from the picture.


It seems that Tesseract / Ocropus are getting confused with the skew an it could be that multiple skewed numbers on the same line is confusing the Tesseract or Ocropus.

Are you passing in the whole image as a grid of numbers ? Have you tried sending each box (number) individually as a separate image to the OCR engine ? You may find you get better results.

Have you tried any other OCR engines ? Do you require it to be open source ?

I ran the image through a cheaper commercial OCR engine and all numbers recognised correctly. So another option is to wrap up a commercial OCR engine quite quickly with C# or C++ code and interface to deliver improved results.


Is it acceptable to use an external (web-based) API for your solution? If so, please consider http://www.wisetrend.com/wisetrend_ocr_cloud.shtml (a REST API for OCR)

It can automatically correct for image rotation; Try tweaking the Deskew and AnalysisMode parameters described in http://www.wisetrend.com/WiseTREND_Online_OCR_API_v2.0.htm

(Also, when using the API, make sure that the image resolution is correctly set in the input image header - it can make all the difference in recognition quality).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜