Read OLE2 file in Java without buffering into memory?
I'm using A开发者_开发百科pache POI to read an OLE2 file (might be Word, might be Excel). Using POIFSFileSystem, I'm able to open the file, and read the contents. That bit's all fine.
However, it does seem to be using quite a bit of memory. Looking at a few bits of POIFS, it seems that various bits of the file get buffered into memory, sometimes more than once.
Is it possible to just read bits in from the File, without loading it all in at once? I notice that with the new file formats (ooxml), you have a choice between a File and an InputStream, and the docs list the File constructor as lower memory. Is there something similar for the older OLE2 POIFS?
I'm using POI 3.7 Final in case that matters!
You're in luck, it can be done, but alas you'll need to upgrade to a beta release - the code went in after 3.7 Final. You should be ok with 3.8 beta 2, but you might want to wait for 3.8 beta 3 if you can as the code's still being worked on.
What you'll need to do is switch from using a POIFSFileSystem to a NPOIFSFileSystem. The N prefix is for the new NIO based OLE2 code, which is more memory efficient when using a stream, and much more memory efficient using a File. See the NPOIFSFileSystem docs for more details.
Your code will want to be something like:
// This is the most memory efficient way to open the FileSystem
NPOIFSFileSystem fs;
try {
fs = new NPOIFSFileSystem(new File(filename));
} catch (IOException e) {
// an I/O error occurred, or the File did not provide a compatible
// POIFS data structure
}
DirectoryEntry root = fs.getRoot();
In 3.8 beta 2, most of the POIDocument classes (HSSFWorkbook etc) will accept a DirectoryEntry in their constructor, so you can read them from a NPOIFSFileSystem. However, write support isn't quite finished though, so you'll need to stick with a POIFSFileSytem if you need to write back out (with the higher memory footprint)
精彩评论