开发者

How to access a subset of XML data in Java when the XML data is too large to fit in memory?

What I would really like is a streaming API that works sort of like StAX, and sort of like DOM/JDom.

It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards).

Here's what code that used such an API would look like.

URL url = ...
XMLStream xml = XXXFactory(url.inputStream()) ;


// process each <book> elemen开发者_运维知识库t in this document.
// the <book> element may have subnodes.
// You get a DOM/JDOM like tree rooted at the next <book>.


while (xml.hasContent()) {
  XMLElement book = xml.getNextElement("book");
  processBook(book);
}

Does anything like this exist?


You could do the following:

  1. Scan the XML file using SAX or StAX and immediately serizalize everything back into a StringBuilder, i.e. create your own copy of the XML file.

  2. If you encounter a endElement and you know you don't need the subtree you just parsed, clear the StringBuilder.

  3. If you need it, you can build a DOM tree from the "copy" you created.

With this you can fall back to standard frameworks, one for conventional SAX parsing and one for conventional DOM building. Only the custom serizalization might require some hacking.

Also it helps if you need to know the tree boundaries in advance. (book elements in your example) Otherwise further processing would be required.


The only way to parse the part of the document without fully loading it to the memory is using the SAX parser.

Here are some official SUN examples of how to use SAX: http://java.sun.com/developer/codesamples/xml.html#sax

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜