MODI leaking memory
I have an app where I'm using MODI 2007 to OCR several multi-page tiff files. I have found that when I kick it off on a directory that contains several good tiffs but also some tiffs that cannot be opened in Windows Picture and Fax Viewer, then MODI also fails to OCR those "bad" tiffs. When this happens, the app is unable to reclaim any of the memory that was used by MODI to OCR those tiffs. After the tool tries to OCR too many of these "bad" tiffs, the machine runs out of memory and the app crashes. I have tried several code fixes from the web that supposedly fix any MODI memory leaks, but so far none have worked for me. I am pasting in the part of the code below that does the OCRing:
StringBuilder strRecText = new StringBuilder(10000);
MODI.Document doc1 = new MODI.Document();
doc1.Create(name);
try
{
doc1.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); // this will ocr all pages of a multi-page tiff file
}
catch (Exception e)
{
doc1.Close(false); // clean up
if (doc1 != null)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1);
doc1 = null;
}
}
MODI.Images images = doc1.Images;
for (int imageCounter = 0; imageCounter < images.Count; imageCounter++)
{
if (imageCounter > 0)
{
if (!noPageBreakFlag)
{
strRecText.Append((char)pageBreakChar);
}
}
MODI.Image image = (MODI.Image)images[imageCounter];
MODI.Layout layout = image.Layout;
strRecText.Append(layout.Text);
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
if (layout != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(layout);
layout = null;
}
if (image != null)
{
开发者_运维百科 System.Runtime.InteropServices.Marshal.FinalReleaseComObject(image);
image = null;
}
}
File.AppendAllText(ocrFile, strRecText.ToString()); // write the OCR file out to disk
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
if (images != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(images);
images = null;
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
doc1.Close(false); // clean up
if (doc1 != null)
{
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(doc1);
doc1 = null;
}
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();
I've been working on a project using MODI for the last few months. MODI has by far been the most accurate OCR engine I've tried, but it has some major issues releasing resources and crashing.
I ended up building a commandline app that takes the path to an image as a commandline parameter, then saves the resulting text to a file and quits. I then use this commandline application by any software that requires modi functionality. It sounds like an odd solution but it's a very simple and straightforward way to solve the memory leak issues that MODI has because when the commandline process exists it's memory is freed by the operating system so you don't have to worry about your application crashing or resources not being cleaned up. I have found that the time it takes to fire up the commandline exe and then read the file that it creates is quite insignificant compared to the time it takes to actually OCR the image, so you are not actually losing much in the way of performance.
精彩评论