Maximum size for data transfer in XML?
Has anyone ever tried passing 5GB of data in a xml. Do web services which require to pass large amount of data use XML over http to pass data.
I am looking forward to making changes in a system to pass data. I am confused if I should pass data in an XML format for 5 GB of 开发者_开发百科data as my main memory is just 2GB.
Will the application break?
Thanks
XML is just a markup language/data format, and does not have any inherent size limits. You can make a 1000 GB XML file if you want.
Things that manipulate a 5 GB XML file (or any other type of 5 GB file) may break if they have not been designed to handle large file sizes. In general, if you are just uploading your large file to a web service you should be okay, because nearly any modern file-upload module is going to support caching the upload to disk as it is received so that the whole file doesn't need to be in memory. You may, however, have some issues with parsing the document once you have it on the server, depending upon what library you use to do the parsing. You may want to look into what sort of streaming XML parsers are available for your web service/platform (or even write your own parser specifically targeted at your XML document format, since then you can make simplifying assumptions that let you limit the amount of memory required at any given time).
I would imagine that most web services that pass large amounts of data around would not use XML as the data transfer format. Bandwidth is expensive, and high latency or long upload times can make for a poor user experience. So I'd expect such services to more typically use an optimized binary format. A reasonable approximation of this could be obtained by simply applying gzip compression to your XML document before you send it.
I've had some experience with large XML files, but maybe not 5GB.
If this is an existing system using XML, then think hard before changing from XML to some other format because the change itself might be more trouble than its worth. Compressing the file will go a long way to helping with the network transfer. A gzipped XML file can be just as efficient as a proprietary binary format.
Your likely bottleneck will be the parsing and processing of the file. If the XML "records" are independent of each other (e.g. if this is a long list of xml types) then you should be able to use a streaming XML parser to avoid loading everything into memory. Also consider using a "non-validating" parser (or switching off validation) to improve performance.
If you can do any of the file processing in XSLT, then you might find that works better than parsing the whole file into a program for manipulation.
Depending on the network transfer time, consider using a reliable network transport such as FTP or BitTorrent. If HTTP loses the connection, you might have to start over.
5gb seems like a large amount to transfer over a web service but you can compress the XML which should significantly reduce the size as XML.
Compression XML metrics .
Alternatively, could you do it as a different approach like a nightly / weekly scheduled task using windows task manager or linux contrab job which zips > FTP's the file across? Then on the other end have another schedule task which runs to import the data. or could have a web page or web service which is used to trigger an import to start on the receiving server
精彩评论