开发者

How to retrieve/read existing OCR data in .tif files using Java?

I want to get existing OCR data in .tif files usin开发者_高级运维g Java. This OCR data is created using MS Office Document Image Writer. I have searched a little bit open source libraries but I couldn't find any library/tool which can retrieve/read attached OCR data.

How to get this OCR data in .tif files using Java?


OCR Data which is created using MS Office Document Image Writer and the (other) Metadata can be retrieved using ExifTool.

Example:

String[] cmdLineInput = { "C:\\ExifTool\\exif.exe", "-ee",
        "C:\\images\\example.tif" };
ProcessBuilder processBuilder = new ProcessBuilder(cmdLineInput);
Process exif; // = processBuilder.start();

/**
 * CmdLineIpnut[1] = Fully qualified path to exiftool CmdLineIpnut[2] =
 * -ee // ( extract embedded ) option to extract data from multipaged
 * .tif files. CmdLineIpnut[3] = Fully qualified path to .tif file.
 */

String outputLine = "";

try {
    exif = processBuilder.start();
    BufferedReader brInput = new BufferedReader(new InputStreamReader(
            exif.getInputStream()));

    while ((outputLine = brInput.readLine()) != null) {
        System.out.println(outputLine);

    }
    exif.waitFor();

} catch (IOException ioe) {
    // handle exeception
}

You can parse some data from outputLine and store in an object to use for further handling, as example to save in a database.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜