开发者

decode large base64 from xml in java: OutOfMemory

I need to write a ba开发者_Go百科se64 encoded element of an xml file into a separate file. Problem: the file could easily reach the size of 100 MB. Every solution I tried ended with the "java.lang.OutOfMemoryError: Java heap space". The problem is not reading the xml in general or the decoding process, but the size of the base64 block.

I used jdom, dom4j and XMLStreamReader to access the xml file. However, as soon as I want to access the base64 content of the respective element I get the mentioned error. I also tried an xslt using saxon's base64Binary-to-octets function, but of course with the same result.

Is there a way to stream this base64 encoded part into a file without getting the whole chunk in one single piece?

Thanks for your hints,

Andreas


Apache Commons Codec has a Base64OutputStream, which should allow you to feed the XML data in a scalable way, by chaining the Base64OutputStream with a FileOutputStream.

You'll need a representation of the XML as a String, so you may not even have to read it into a DOM structure at all.

Something like:

PrintWriter printWriter = new PrintWriter(
   new Base64OutputStream(
      new BufferedOutputStream(
         new FileOutputStream("/path/to/my/file")
      )
   )
);
printWriter.write(myXml);
printWriter.close();

If the input XML file is too big, then you should read chunks of it into a buffer in a loop, writing the buffer contents to the output (i.e. a standard reader-to-writer copy).


I don't think any XML api would let you access an element's text as a stream rather than a String. If the String is 100 MB, then your only option is probably to change the JVM's heap size until you don't have any OutOfMemoryError :

java -Xmx256m your.class.Name


Try the StAX API (tutorial). For large text elements, you should get several text events which you need to push into a streaming Base64 implementation (like the one skaffman mentioned).


If your file can get that big, never use a DOM parser. Use a simple SAX approach to access the data elements, and stream the base64 data into Base64OutputStream as mentioned above.


As lbruder said, use a SAX parser to read the document in a streaming fashion. If you use Base64OutputStream then you have to set the flag to let it DECODE instead of the default ENCODE. You also have to convert the char array from the characters callback to a byte array before passing it to the outputstream, needing additional memory allocations and copies.

I wrote an alternative base64 decoder for exactly this usecase, it is available at github. Here is an example on how to use it:

Base64StreamDecoder decoder = new Base64StreamDecoder();
OutputStream out;

...

public void startElement(String uri, String localName, String qName, Attributes atts) {
    decoder.reset();
    out = new BufferedOutputStream(new FileOutputStream(...));
}

public void endElement(String uri, String localName, String qName) {
    decoder.checkComplete();
    out.close();
}

public void characters(char[] ch, int start, int length) {
    decoder.decode(ch, start, length, out);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜