c# How to read a single file with normal and xml text elements
I am receiving a stream of data from a webservice and trying to save the contents of the stream to file. The stream contains standard lines of text alongside large chunks of xml data (on a single line). The size of the file is about 800Mb.
Problem: Receiving an out of memory exception when I process the xml section of each line.
==start file
line 1
line 2
<?xml version=.....huge line etc</xml>
line 3
line4
<?xml version=.....huge line etc</xml>
==end file
Current code, as you can see when it reads in the huge xml line then it spikes the memory.
string readLine;
using (StreamReader reader = new StreamReader(downloadStream))
{
while ((readLine = reader.ReadLine()) != null)
{
streamWriter.WriteLien(readLine); //writes to file
}
}
I was trying to think of a solution where I used both a TextReader/StreamReader and XmlTextReader in combination to process each section. As I get to the xml section I could switch to the XmlTextReader and use the Read() method to read each node thus stopping the memory spike.
Any suggestions on how I could do this? Alternatively, I could create a custom XmlTextReader that was able to read in these lines? Any pointers for this?
Updated
A further problem to this is that I need to read this file back in and split out the two xml sections to separate xml files! I converted the solution to write the file using a binary writer and then started to read the file back in using a binary reader. I have text processing to detect the start of the xml section and specifically which xml section so I can map it to the correct file! However this causes problems reading in the binary file and doing detection...
using (BinaryReader reader = new BinaryReader(savedFileStream))
{
while ((streamLine = reader.ReadString()) != null)
{
if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag1"))
//xml file 1
else if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag2开发者_如何转开发"))
//xml file 2
XML may contain all content as one single line, so you'd probably better use a binary reader/writer where you can decide about the read/write size.
An example below, here we read BUFFER_SIZE bytes for each iteration:
Stream s = new MemoryStream();
Stream outputStream = new MemoryStream();
int BUFFER_SIZE = 1024;
using (BinaryReader reader = new BinaryReader(s))
{
BinaryWriter writer = new BinaryWriter(outputStream);
byte[] buffer = new byte[BUFFER_SIZE];
int read = buffer.Length;
while(read != 0)
{
read = reader.Read(buffer, 0, BUFFER_SIZE);
writer.Write(buffer, 0, read);
}
writer.Flush();
writer.Close();
}
I don't know if this causes you problems with encodings etc, but I think you will have to read the file as binary.
If all you want to do is copy one stream to another without modifying the data, you don't need the Stream text or binary helpers (StreamReader, StreamWriter, BinaryReader, BinaryWriter, etc.), simply copy the stream.
internal static class StreamExtensions
{
public static void CopyTo(this Stream readStream, Stream writeStream)
{
byte[] buffer = new byte[4096];
int read;
while ((read = readStream.Read(buffer, 0, buffer.Length)) > 0)
writeStream.Write(buffer, 0, read);
}
}
I think there is a memory leakage
Are you getting out of memory exception after processing a few lines or on the first line itself?
And there is no streamWriter.Flush() inside the while loop.
Don't you think there should be one?
精彩评论