开发者

Reading XML from disk one record at a time with memory

I am trying to do a merge sort on sorted chunks of XML files on disks. No chance that they all fit in memory. My XML files consists of records.

Say I have n XML files. If I had enough memory I would read the entire contents of each file into a correspoding Queue, one queue for each file, compare the timestamp on each item in each queue and output the one with the smallest timestamp to another file (the merge file). This way, I merge all the little files into one big file with all the entries time-sorted.

The problem is that I don't have enough memory to read all XML with .ReadToEnd to later pass to .Parse method of an XDocument.

Is there a clean way 开发者_C百科to read just enough records to keep each of the Queues filled for the next pass that compares their XElement attribute "TimeStamp", remembering which XElement from disk it has read?

Thank you.


An XmlReader is what you are looking for.

Represents a reader that provides fast, non-cached, forward-only access to XML data.


So it has fallen out of fashion, but this is exactly the problem solved with SAX. It is the Simple API for XML, and is based on callbacks. You launch a read operation, and your code gets called back for each record. This may be an optioin, as this does not require the program to load in the entire XML file (ala XMLDocument). Google SAX.


If you like the linq to xml api, this codeplex project may suite your needs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜