开发者

GZipStream effectivness

I am trying to save big UInt16 array into a file. positionCnt is about 50000, stationCnt is about 2500. Saved directly, without GZipStream, the file is about 250MB which can be compressed by exter开发者_如何学Pythonnal zip program to 19MB. With the following code the file is 507MB. What do I do wrong?

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BinaryWriter fs = new BinaryWriter(cmp);
fs.Write((Int32)(positionCnt * stationCnt));
for (int p = 0; p < positionCnt; p++)
{
    for (int s = 0; s < stationCnt; s++)
    {
       fs.Write(BoundData[p, s]);
    }
}
fs.Close();


Not sure what version of .NET you're running on. In earlier versions, it used a window size that was the same size as the buffer that you wrote from. So in your case it would try to compress each integer individually. I think they changed that in .NET 4.0, but haven't verified that.

In any case, what you want to do is create a buffered stream ahead of the GZipStream:

// Create file stream with 64 KB buffer FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write, FileShare.None, 65536); GZipStream cmp = new GZipStream(fs, CompressionMode.Compress); ...

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BufferedStream buffStrm = new BufferedStream(cmp, 65536);
BinaryWriter fs = new BinaryWriter(buffStrm);

This way, the GZipStream gets data in 64 Kbyte chunks, and can do a much better job of compressing.

Buffers larger than 64KB won't give you any better compression.


For whatever reason, which is not apparent to me during a quick read of the GZip implementation in .Net, the performance is sensitive to the amount of data written at once. I benchmarked your code against a few styles of writing to the GZipStream and found the most efficient version wrote long strides to the disk.

The trade-off is memory in this case, as you need to convert the short[,] to byte[] based on the stride length you'd like:

using (var writer = new GZipStream(File.Create("compressed.gz"),
                                   CompressionMode.Compress))
{
    var bytes = new byte[data.GetLength(1) * 2];
    for (int ii = 0; ii < data.GetLength(0); ++ii)
    {
        Buffer.BlockCopy(data, bytes.Length * ii, bytes, 0, bytes.Length);
        writer.Write(bytes, 0, bytes.Length);
    }

    // Random data written to every other 4 shorts
    // 250,000,000 uncompressed.dat
    // 165,516,035 compressed.gz (1 row strides)
    // 411,033,852 compressed2.gz (your version)
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜