reading in large text files for parsing
I am working with a few text files that range from 1-2 Gig in size. I cannot use the conventional streamreader and decided to read in chuncks and do my work. The problem is that I am not sure when the end of the file is reached since it has been w开发者_Go百科orking on one file for a long time and I am not sure how much larger I can make by buffer to read. Here is the code:
dim Buffer_Size = 30000
dim bufferread = new [Char](Buffer_Size - 1){}
dim bytesread as integer = 0
dim totalbytesread as integer = 0
dim sb as new stringbuilder
Do
bytesread = inputfile.read(bufferread, 0 , Buffer_Size)
sb.append(bufferread)
totalbytesread = bytesread + totalbytesread
if sb.length > 9999999 then
data = sb.tostring
if not data is nothing then
parsingtools.load(data)
endif
endif
if totalbytesread > 1000000000 then
logs.constructlog("File almost done")
endif
loop until inputfile.endofstream
is there any control or code that I can check how much of the file remains?
Have you looked at BufferedStream?
http://msdn.microsoft.com/en-us/library/system.io.bufferedstream%28v=VS.100%29.aspx
You can wrap your stream with that. Also, I'd set the buffer size to megs, not something as small as 30,000.
As far as how much is left? can you just ask the stream for it's length before hand?
Below is a code snippet I use for wrapping a buffered stream around a stream. (sorry it's c#)
private static void CopyTo(AzureBlobStore azureBlobStore,Stream src, Stream dest, string description)
{
if (src == null)
throw new ArgumentNullException("src");
if (dest == null)
throw new ArgumentNullException("dest");
const int bufferSize = (AzureBlobStore.BufferSizeForStreamTransfers);
// buffering happening internally. this is just to avoid 4gig boundary and have something to show
int readCount;
//long bytesTransfered = 0;
var buffer = new byte[bufferSize];
//string totalBytes = FormatBytes(src.Length);
while ((readCount = src.Read(buffer, 0, buffer.Length)) != 0)
{
if (azureBlobStore.CancelProcessing)
{
break;
}
dest.Write(buffer, 0, readCount);
//bytesTransfered += readCount;
//Console.WriteLine("AzureBlobStore:CopyTo:{0}:{1} {2}", FormatBytes(bytesTransfered), totalBytes,description);
}
}
Hope this helps.
精彩评论