Is there an OCR library that outputs coordinates of words found within an image? [closed]

2023-02-11 16:23 问答作者：

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 1 year ago开发者_如何学C.

Improve this question

In my experience, OCR libraries tend to merely output the text found within an image but not where the text was found. Is there an OCR library that outputs both the words found within an image as well as the coordinates (x, y, width, height) where those words were found?

Most commercial OCR engines will return word and character coordinate positions but you have to work with their SDK's to extract the information. Even Tesseract OCR will return position information but it has been not easy to get to. Version 3.01 will make easier but a DLL interface is still being worked on.

Unfortunately, most free OCR programs use Tesseract OCR in its basic form and they only report the raw ASCII results.

www.transym.com - Transym OCR - outputs coordinates. www.rerecognition.com - KADMOS engine returns coordinates.

Also Caere Omnipage, Mitek, Abbyy, Charactell return character positions.

I'm using TessNet (a Tesseract C# wrapper) and I'm getting word coordinates with the following code:

TextWriter tw = new StreamWriter(@"U:\user files\bwalker\ocrTesting.txt");
Bitmap image = new Bitmap(@"u:\user files\bwalker\2849257.tif");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
// If digit only
ocr.SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,$-/#&=()\"':?");
// To use correct tessdata
ocr.Init(@"C:\Users\bwalker\Documents\Visual Studio 2010\Projects\tessnetWinForms\tessnetWinForms\bin\Release\", "eng", false); 
List<tessnet2.Word> result = ocr.DoOCR(image, System.Drawing.Rectangle.Empty);
string Results = "";
foreach (tessnet2.Word word in result)
{
    Results += word.Confidence + ", " + word.Text + ", " +word.Top+", "+word.Bottom+", "+word.Left+", "+word.Right+"\n";
}
using (StreamWriter writer = new StreamWriter(@"U:\user files\bwalker\ocrTesting2.txt", true))
{
    writer.WriteLine(Results);//+", "+word.Top+", "+word.Bottom+", "+word.Left+", "+word.Right);
    writer.Close();
}
MessageBox.Show("Completed");

You can use the hocr "configfile" with tesseract like so:

tesseract syllabus-page1.jpg syllabus-page1 hocr

This will output a mostly HTML5 document with elements like:

<div class='ocr_page' id='page_1' title='image "syllabus-page1.jpg"; bbox 0 0 2531 3272; ppageno 0'>
  <div class="ocr_carea" id="block_1_4" title="bbox 265 1183 2147 1778">
    <p class="ocr_par" dir="ltr" id="par_1_8" title="bbox 274 1305 655 1342">
      <span class="ocr_line" id="line_1_14" title="bbox 274 1305 655 1342; baseline -0.005 0; x_size 46.378059; x_descenders 10.378059; x_ascenders 12">
        <span class="ocrx_word" id="word_1_78" title="bbox 274 1307 386 1342; x_wconf 90" lang="eng" dir="ltr">needs</span>
        <span class="ocrx_word" id="word_1_79" title="bbox 402 1318 459 1342; x_wconf 90" lang="eng" dir="ltr">are</span>
        <span class="ocrx_word" id="word_1_80" title="bbox 474 1305 655 1341; x_wconf 86" lang="eng" dir="ltr">different:</span>
      </span>
    </p>
    ...
  </div>  
  ...
</div>

While I'm pretty sure that's not how you're supposed to use XML, I found it easier than digging into the tesseract API.

P.S. I realize that several comments and answers allude to this solution, but none of them actually show how to use the hocr option or describe the output you get from that.

Google Vision API does this. https://cloud.google.com/vision/docs/detecting-text

"description": "Wake up human!\n",
      "boundingPoly": {
        "vertices": [
          {
            "x": 29,
            "y": 394
          },
          {
            "x": 570,
            "y": 394
          },
          {
            "x": 570,
            "y": 466
          },
          {
            "x": 29,
            "y": 466
          }
        ]
      }

You may also take a look at Gamera framework (http://gamera.informatik.hsnr.de/) it is a set of tools, which allows you to build your own OCR engine. Nevertheless the fastest way is to use Tesseract or OCRopus hOCR (http://en.wikipedia.org/wiki/HOCR) output.

For Java Developers:

I will recommend for this you to use Tesseract and Tess4j.

You can actually find an example on how to find words on a Image in one of the tests of Tess4j.

https://github.com/nguyenq/tess4j/blob/master/src/test/java/net/sourceforge/tess4j/TessAPITest.java#L449-L517

public void testResultIterator() throws Exception {
    logger.info("TessBaseAPIGetIterator");
    File tiff = new File(this.testResourcesDataPath, "eurotext.tif");
    BufferedImage image = ImageIO.read(new FileInputStream(tiff)); // require jai-imageio lib to read TIFF
    ByteBuffer buf = ImageIOHelper.convertImageData(image);
    int bpp = image.getColorModel().getPixelSize();
    int bytespp = bpp / 8;
    int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
    api.TessBaseAPIInit3(handle, datapath, language);
    api.TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO);
    api.TessBaseAPISetImage(handle, buf, image.getWidth(), image.getHeight(), bytespp, bytespl);
    ETEXT_DESC monitor = new ETEXT_DESC();
    TimeVal timeout = new TimeVal();
    timeout.tv_sec = new NativeLong(0L); // time > 0 causes blank ouput
    monitor.end_time = timeout;
    ProgressMonitor pmo = new ProgressMonitor(monitor);
    pmo.start();
    api.TessBaseAPIRecognize(handle, monitor);
    logger.info("Message: " + pmo.getMessage());
    TessResultIterator ri = api.TessBaseAPIGetIterator(handle);
    TessPageIterator pi = api.TessResultIteratorGetPageIterator(ri);
    api.TessPageIteratorBegin(pi);
    logger.info("Bounding boxes:\nchar(s) left top right bottom confidence font-attributes");
    int level = TessPageIteratorLevel.RIL_WORD;

    // int height = image.getHeight();
    do {
        Pointer ptr = api.TessResultIteratorGetUTF8Text(ri, level);
        String word = ptr.getString(0);
        api.TessDeleteText(ptr);
        float confidence = api.TessResultIteratorConfidence(ri, level);
        IntBuffer leftB = IntBuffer.allocate(1);
        IntBuffer topB = IntBuffer.allocate(1);
        IntBuffer rightB = IntBuffer.allocate(1);
        IntBuffer bottomB = IntBuffer.allocate(1);
        api.TessPageIteratorBoundingBox(pi, level, leftB, topB, rightB, bottomB);
        int left = leftB.get();
        int top = topB.get();
        int right = rightB.get();
        int bottom = bottomB.get();
        /******************************************/
        /* COORDINATES AND WORDS ARE PRINTED HERE */
        /******************************************/
        System.out.print(String.format("%s %d %d %d %d %f", word, left, top, right, bottom, confidence));
        // logger.info(String.format("%s %d %d %d %d", str, left, height - bottom, right, height - top)); //
        // training box coordinates

        IntBuffer boldB = IntBuffer.allocate(1);
        IntBuffer italicB = IntBuffer.allocate(1);
        IntBuffer underlinedB = IntBuffer.allocate(1);
        IntBuffer monospaceB = IntBuffer.allocate(1);
        IntBuffer serifB = IntBuffer.allocate(1);
        IntBuffer smallcapsB = IntBuffer.allocate(1);
        IntBuffer pointSizeB = IntBuffer.allocate(1);
        IntBuffer fontIdB = IntBuffer.allocate(1);
        String fontName = api.TessResultIteratorWordFontAttributes(ri, boldB, italicB, underlinedB, monospaceB,
                serifB, smallcapsB, pointSizeB, fontIdB);
        boolean bold = boldB.get() == TRUE;
        boolean italic = italicB.get() == TRUE;
        boolean underlined = underlinedB.get() == TRUE;
        boolean monospace = monospaceB.get() == TRUE;
        boolean serif = serifB.get() == TRUE;
        boolean smallcaps = smallcapsB.get() == TRUE;
        int pointSize = pointSizeB.get();
        int fontId = fontIdB.get();
        logger.info(String.format("  font: %s, size: %d, font id: %d, bold: %b,"
                + " italic: %b, underlined: %b, monospace: %b, serif: %b, smallcap: %b", fontName, pointSize,
                fontId, bold, italic, underlined, monospace, serif, smallcaps));
    } while (api.TessPageIteratorNext(pi, level) == TRUE);

    assertTrue(true);
}

ABCocr.NET (our component) will allow you to obtain the coordinates of each word found. The values are accessible through the Word.Bounds property, which simply returns a System.Drawing.Rectangle.

The example below shows how you can OCR an image using ABCocr.NET and output the information you need:

using System;
using System.Drawing;
using WebSupergoo.ABCocr3;

namespace abcocr {
    class Program {
        static void Main(string[] args) {

            Bitmap bitmap = (Bitmap)Bitmap.FromFile("example.png");
            Ocr ocr = new Ocr();
            ocr.SetBitmap(bitmap);

            foreach (Word word in ocr.Page.Words) {
                Console.WriteLine("{0}, X: {1}, Y: {2}, Width: {3}, Height: {4}",
                    word.Text,
                    word.Bounds.X,
                    word.Bounds.Y,
                    word.Bounds.Width,
                    word.Bounds.Height);
            }
        }
    }
}

Disclosure: posted by a member of the WebSupergoo team.

hocr is a one of the output format of tesseract OCR engine,which has both word and it's coordinates and also has some additional info like confident level of word recognition.

继续阅读：ocr

Is there an OCR library that outputs coordinates of words found within an image? [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？