zlib compressing byte array?
I have this uncompressed byte array:
0E 7C BD 03 6E 65 67 6C 65 63 74 00 00 00 00 00 00 00 00 00 42 52 00 00 01 02 01
00 BB 14 8D 37 0A 00 00 01 00 00 00 00 05 E9 05 E9 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 81 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 05 00 00 01 00 00 00
And I need to compress it using the deflate algorithm (implemented in zlib), from what I searched the equivalent in C# would be using GZipStream but I can't match the compressed resulted at all.
Here is the compressing code:
public byte[] compress(byte[] input)
{
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream deflateStream = new GZipStream(ms, CompressionMode.Compress))
{
deflateStream.Write(input, 0, input.Length);
}
return ms.ToArray();
}
}
Here is the result of the above compressing code:
1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F 6D CA 7B 7F 4A F5 4A
D7 E0 74 A1 08 80 60 13 24 D8 90 40 10 EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A
81 CA 65 56 65 5D 66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B 9D
4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80 AA C8 1F 3F 7E 7C 1F 3F
22 7E 93 9F F9 FB 7F ED 65 7E 51 E6 D3 F6 D7 30 CF 93 57 BF C6 AF F1 6B FE 5A BF
E6 AF F1 F7 FE 56 7F FC 03 F3 D9 AF FB 5F DB AF 83 E7 0F FE 35 23 1F FE BA F4 FE
AF F1 6B FC 1A FF 0F 26 EC 38 8开发者_如何学编程2 5C 00 00 00
Here is the result I am expecting:
78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A
CD B9 80 4C 90 18 EB 4B D6 97 0C 28 00 2C CC D0 C8 C8 80 09 58 21 B2 00 65 6B 08
C8
What I am doing wrong, could some one help me out there ?
First, some information: DEFLATE is the compression algorithm, it is defined in RFC 1951. DEFLATE is used in the ZLIB and GZIP formats, defined in RFC 1950 and 1952 respectively, which essentially are thin wrappers around DEFLATE bytestreams. The wrappers provide metadata such as, the name of the file, timestamps, CRCs or Adlers, and so on.
.NET's base class library implements a DeflateStream that produces a raw DEFLATE bytestream, when used for compression. When used in decompression it consumes a raw DEFLATE bytestream. .NET also provides a GZipStream, which is just a GZIP wrapper around that base. There is no ZlibStream in the .NET base class library - nothing that produces or consumes ZLIB. There are some tricks to doing it, you can search around.
The deflate logic in .NET exhibits a behavioral anomaly, where previously compressed data can actually be inflated, significantly, when "compressed". This was the source of a Connect bug raised with Microsoft, and has been discussed here on SO. This may be what you are seeing, as far as ineffective compression. Microsoft have rejected the bug, because while it is ineffective for saving space, the compressed stream is not invalid, in other words it can be "decompressed" by any compliant DEFLATE engine.
In any case, as someone else posted, the compressed bytestream produced by different compressors may not necessarily be the same. It depends on their default settings, and the application-specified settings for the compressor. Even though the compressed bytestreams are different, they may still decompress to the same original bytestream. On the other hand the thing you used to compress was GZIP, while it appears what you want is ZLIB. While they are related, they are not the same; you cannot use GZipStream to produce a ZLIB bytestream. This is the primary source of the difference you see.
I think you want a ZLIB stream.
The free managed Zlib in the DotNetZip project implements compressing streams for all of the three formats (DEFLATE, ZLIB, GZIP). The DeflateStream and GZipStream work the same way as the .NET builtin classes, and there's a ZlibStream class in there, that does what you think it does. None of these classes exhibit the behavior anomaly I described above.
In code it looks like this:
byte[] original = new byte[] {
0x0E, 0x7C, 0xBD, 0x03, 0x6E, 0x65, 0x67, 0x6C,
0x65, 0x63, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x42, 0x52, 0x00, 0x00,
0x01, 0x02, 0x01, 0x00, 0xBB, 0x14, 0x8D, 0x37,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x05, 0xE9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x81, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00,
0x01, 0x00, 0x00, 0x00
};
var compressed = Ionic.Zlib.ZlibStream.CompressBuffer(original);
The output is like this:
0000 78 DA E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 x...........\...
0010 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F NA...L...Ez.ab./
0020 19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07 ...FF,..@.@..5%.
0030 CE .
To decompress,
var uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(compressed);
You can see the documentation on the static CompressBuffer method.
EDIT
The question is raised, why is DotNetZip producing 78 DA
for the first two bytes instead of 78 9C
? The difference is immaterial. 78 DA
encodes "max compression", while 78 9C
encodes "default compression". As you can see in the data, for this small sample, the actual compressed bytes are exactly the same whether using BEST or DEFAULT. Also, the compression level information is not used during decompression. It has no effect in your application.
If you don't want "max" compression, in other words if you are very set on getting 78 9C
as the first two bytes, even though it doesn't matter, then you cannot use the CompressBuffer
convenience function, which uses the best compression level under the covers. Instead you can do this:
var compress = new Func<byte[], byte[]>( a => {
using (var ms = new System.IO.MemoryStream())
{
using (var compressor =
new Ionic.Zlib.ZlibStream( ms,
CompressionMode.Compress,
CompressionLevel.Default ))
{
compressor.Write(a,0,a.Length);
}
return ms.ToArray();
}
});
var original = new byte[] { .... };
var compressed = compress(original);
The result is:
0000 78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 x...........\...
0010 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F NA...L...Ez.ab./
0020 19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07 ...FF,..@.@..5%.
0030 CE .
Quite simply what you got had a GZip header. What you want is the simpler Zlib header. ZLib has options for GZip header, Zlib header or no header. Typically the Zlib header is used unless the data is associated with a disk file (in which case GZip header is used.) Apparently, there is no way with .Net library to write a zlib header (even though this is by far the most common header used in file formats). Try http://dotnetzip.codeplex.com/.
You can quickly test all the different zlib options using HexEdit (Operations->Compression->Settings). See http://www.hexedit.com . It took me 10 minutes to check your data by simply pasting your compressed bytes into HexEdit and decompressing. Also tried compressing your orignal bytes with GZip and ZLib headers as a double-check. Note that you may have to fiddle with the settings to get exactly the bytes you were expecting.
精彩评论