Editing a BIG XML via DOM parser
If there is a very big XML and DOM parser is used to parse it. Now there is a requirement to add/delete elements from the XML i.e edit 开发者_如何转开发the XML How to edit the XML as the entire XML will not be loaded due to memory constraints ? What could be the strategy to solve this ?
You may consider to use a SAX parser instead, which doesn't keep the whole document in memory. It will be faster and will also use much less memory.
As two other answers mentioned already, a SAX parser will do the trick. Your other alternative to DOM is a StAX parser.
Traditionally, XML APIs are either:
- DOM based - the entire document is read into memory as a tree structure for random access by the calling application
- event based - the application registers to receive events as entities are encountered within the source document.
Both have advantages; the former (for example, DOM) allows for random access to the document, the latter (e.g. SAX) requires a small memory footprint and is typically much faster.
These two access metaphors can be thought of as polar opposites. A tree based API allows unlimited, random access and manipulation, while an event based API is a 'one shot' pass through the source document.
StAX was designed as a median between these two opposites. In the StAX metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of location within the document.
StAX is my preferred approach for handling large documents. If DOM is a requirement, check out DOM implementations like Xerces that support lazy construction of DOM nodes:
- http://xerces.apache.org/xerces-j/faq-write.html#faq-4
Your assumption of memory constraint loading the XML document may only apply to DOM. VTD-XML loads the entire XML in memory, and does it efficiently (1.3x the size of XML document)... both in memory and performance...
http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf
Another distinct benefit, which none other XML framework in existence has, is its incremental update capability...
http://www.devx.com/xml/Article/36379
As stivlo mentioned you can use a SAX parser for reading the XML.
But for writing the XML you can write into fileoutput stream as plain text. I am sure that you will get requirement that mentions after which tag or under which tag the new data should be inserted.
精彩评论