Examining XML in a stream
If I have a large XML document, which I don't want to load entirely into memory, and some configurable value like an XPath statement or othe format that identifies a path to an element in the xml, is it possible to read the xml from a stream node by node until I find the location I am looking for?
We need to build facilities to pull out a value from xml without knowing the schema. All we have is the xml document and an xpath statement. We could probably revise to use something other than xpath, but we really want to avoid loading up the whole document because we need to process in realtime, and the xml could be fairly large, and the volume could g开发者_JAVA百科et high.
LibXML2 provides a streaming API (where you can parse a document a chunk at a time) and also XPath. Mixing the two isn't as straightforward as with the standard DOM parser, but it's possible to do on a per-element basis. See here for more info: http://xmlsoft.org/xmlreader.html#Mixing
You can do this with Saxon-EE. The simplest approach is probably using XQuery document projection: see here
http://www.saxonica.com/documentation/sourcedocs/projection.xml
try http://code.google.com/p/jlibs/wiki/XMLDog
XMLDog can evaluate xpaths using SAX (i,e without loading whole document into memory)
精彩评论