Processing large xml files
I am having a large xml file which contains many sub elements. I want to able to run some xpath queries. I tried using vtd-xml in java, but I get outofmemory error sometimes, because the xml is so large to fit into memory. 开发者_StackOverflow社区Is there an alternative way of processing such large xml's.
try http://code.google.com/p/jlibs/wiki/XMLDog
it executes xpaths using sax without creating in-memory representation of xml documents.
SAXParser is very efficient when working with large files
What are you trying to do right now? By the sounds of it you are trying to use a DOM based parser, which essentially loads the entire XML file into memory as a DOM representation. If you are dealing with a large file, you'll better off using a SAX parser, which processes the XML document in a streaming fashion.
I personally recommend StAX for this.
Did you use standard vtd or extended VTD-xml? If you use extended XML then you have the option of using memory mapping... did you try that?
Using XPath might not be a very good idea if you plan on compiling many expressions dynamically in a long lived application.
I'm not entirely sure how the java version of XPath works, but in .NET XPath compiles a dynamic assembly then adds it to the app domain. Subsequent uses of the expression look at the assembly now loaded into memory.
In one case, where I was using XPath it lead to a situation where I think, this same type of mechanism was slowing filling up memory similar to a memory leak.
My theory is that as each expression was compiled using values from the user, each compiled expressions was likely unique, so a new expression was compiled and added to the app domain.
Since you can remove the assembly from the app domain without restarting the entire app domain, memory was being consumed each time an expression was evaluated and it could not be recovered. As a result, the code was leaking memory in the form of assemblies in memory, and after a while, well you know the results.
精彩评论