GZipStream effectivness
I am trying to save big UInt16 array into a file. positionCnt is about 50000, stationCnt is about 2500. Saved directly, without GZipStream, the file is about 250MB which can be compressed by exter开发者_如何学Pythonnal zip program to 19MB. With the following code the file is 507MB. What do I do wrong?
GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BinaryWriter fs = new BinaryWriter(cmp);
fs.Write((Int32)(positionCnt * stationCnt));
for (int p = 0; p < positionCnt; p++)
{
for (int s = 0; s < stationCnt; s++)
{
fs.Write(BoundData[p, s]);
}
}
fs.Close();
Not sure what version of .NET you're running on. In earlier versions, it used a window size that was the same size as the buffer that you wrote from. So in your case it would try to compress each integer individually. I think they changed that in .NET 4.0, but haven't verified that.
In any case, what you want to do is create a buffered stream ahead of the GZipStream
:
// Create file stream with 64 KB buffer
FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write, FileShare.None, 65536);
GZipStream cmp = new GZipStream(fs, CompressionMode.Compress);
...
GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BufferedStream buffStrm = new BufferedStream(cmp, 65536);
BinaryWriter fs = new BinaryWriter(buffStrm);
This way, the GZipStream
gets data in 64 Kbyte chunks, and can do a much better job of compressing.
Buffers larger than 64KB won't give you any better compression.
For whatever reason, which is not apparent to me during a quick read of the GZip implementation in .Net, the performance is sensitive to the amount of data written at once. I benchmarked your code against a few styles of writing to the GZipStream
and found the most efficient version wrote long strides to the disk.
The trade-off is memory in this case, as you need to convert the short[,]
to byte[]
based on the stride length you'd like:
using (var writer = new GZipStream(File.Create("compressed.gz"),
CompressionMode.Compress))
{
var bytes = new byte[data.GetLength(1) * 2];
for (int ii = 0; ii < data.GetLength(0); ++ii)
{
Buffer.BlockCopy(data, bytes.Length * ii, bytes, 0, bytes.Length);
writer.Write(bytes, 0, bytes.Length);
}
// Random data written to every other 4 shorts
// 250,000,000 uncompressed.dat
// 165,516,035 compressed.gz (1 row strides)
// 411,033,852 compressed2.gz (your version)
}
精彩评论