开发者

Handling strings more than 2 GB

I have an application where an XLS file with lots of data entered by the user is opened and the data in it is converted to XML. I have already mapped the columns in the XLS file to XML Maps. When I try to use the ExportXml method in XMLMaps, I get a string with the proper XML representation of the XLS file. I parse this string a bit and upload it to my server.

The problem is, when my XLS file is really large, the string produced for XML is over 2 GB and I get a Out of Memory exception. I understand that the limit for CLR objects is 2 GB. But in my case I need to handle this scenario. Presently I just message asking the user to send less data.

Any ideas on how I can do this?

EDIT:

This is just a jist of the operation I need to do on the generated XML.

  • Remove certain fields which are not needed for the server data.
  • Add something like ID numbers for each row of data.
  • Modify the values of certain elements.
  • Do validation on the data.

While the XMLReader stream is a good idea, I cannot perform these operations by that method. While data validation can be done by Excel itself, the other things cannot be done here.

Using XMLTextReader and XMLTextWriter and creating a custom method for each of the s开发者_JAVA百科tep is a solution I had thought of. But to go through the jist above, it requires the XML document to be gone through or processed 4 times. This is just not efficient.


If the XML is that large, then you might be able to use Export to a temporary file, rather than using ExportXML to a string - http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.xmlmap.export.aspx

If you then need to parse/handle the XML in C#, then for handling such large XML structures, you'll probably be better off implementing a custom XMLReader (or XMLWriter) which works at the stream level. See this question for some similar advice - What is the best way to parse large XML (size of 1GB) in C#?


I guess there is no other way then using x64-OS and FX if you really need to hold the whole thing in RAM, but using some other way to process the data like suggested by Stuart may is the better way to go...


What you need to do is to use "stream chaining", i.e. you open up an input stream which reads from your excel file and an output stream that writes to your xml file. Then your conversion class/method will take the two streams as input and read sufficient data from the input stream to be able to write to the output.

Edit: very simple minimal Example

Converting from file:

  123
  1244125
  345345345 
  4566
  11 

to

  <List>
      <ListItem>123</ListItem>
      <ListItem>1244125</ListItem>
      ...
  </List>

using

  void Convert(Stream fromStream, Stream toStream)
  {
     using(StreamReader from= new StreamReader(fromStream))
     using(StreamWriter to = new StreamWriter(toStream))
     {
        to.WriteLine("<List>");
        while(!from.EndOfStream)
        {
            string bulk = from.ReadLine(); //in this case, a single line is sufficient
            //some code to parse the bulk or clean it up, e.g. remove '\r\n' 
            to.WriteLine(string.Format("<ListItem>{0}</ListItem>", bulk));
        }
        to.WriteLine("</List>"); 
      }

  }

  Convert(File.OpenRead("source.xls"), File.OpenWrite("source.xml"));

Of course you could do this in much more elegent, abstract manner but this is only to show my point

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜