开发者

Reduce Java memory usage when parsing/saving/caching multiple files

Application flow

An input file consists of multiple logical documents.

  1. Extract one input logical document.
  2. Parse the elements within the document.
  3. Build an xml out of the input logical document.
  4. Write that document back to a physical file.

What would be a good way to reduce memory needs?

Right now, I save all the logical documents in a physical file in an ArrayList so that I do all the I/O once. But when I write a single logical docum开发者_StackOverflow中文版ent to stream after processing, it hits a Java heap space error after 20,000 logical documents. The input logical document count is about 100,000 and I was looking for an efficient way to process & write all of these docs.


Don't keep everything in memory. Instead, read from and write to disk as you go. For instance:

void split(File inputFile, File outputFile) {
    Inputstream is = new BufferedInputStream(new FileInputStream(inputFile));
    OutputStream os = new BufferedOutputStream(new FileOutputStream(outputFile));
    for (;;) {
        Document doc = readDocument(is);
        if (doc == null) break;
        write(buildXml(doc), os);
    }
    os.close();
    is.close();
}

(You'll obviously want to add error handling)

That way, only one logical document is in memory at any given time.


Your problem is not in minimizing IO operations, but the memory you need. If each logical document will be large, you won't even begin the program and fail with heap space.

So,

  1. Work with each logical document : load to memory, form an xml and unload it to the disk.
  2. Try not to load the document fully in memory: just a part of it.


you might consider writing output to a physical file by using some kind of separator instead of an serialized arraylist in a physical file or writing it to different files and then concating the files together and writing the header in the beginning describing how many items there are in a file and/or what part of the file corresponds to what serialized item. but this method is hard to code and is more like an advanced approach/pain in the ass way. java applications just consume lots of memory and you can't do anything about.


Use the jvisualvm memory profiler in the Sun Java 6 JDK to find out where your memory leak is.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜