开发者

Compression - Bit

I want to compress a file that looks like a BITMAP INDEX. (A file that is in binary format with "0" and "1" only).

When using a byte to represent "0" or "1" the compression has a good ratio, because of the low randomness.

Instead of using a byte to represent a "0" or "1" i would like to use a bit. Example: number 8 = 00001000 numbee开发者_JAVA百科r 10 = 00001010

So the uncompressed file will be 8 times smaller than the one with the bitmap index using byte to represent 0 and 1.

But when I compress this file my ratio is very poor because the high randomness of the data.

So my questions is. Is there any compression algorithms that the smaller unit is a bit instead of a byte? Or any tricks that i can use to lower the data randomness?


Is there any compression algorithms that the smaller unit is a bit instead of a byte?

Any sane entropy-based compression algorithm will work on the "bits" level and thus show the expected behaviour. When passing it an input which consists only of "00000001" and "00000000" bytes, the encoder in some sense "sees" that the input consists of damn a lot of "0" bits, sparked with some "1"s -- it will adapt to this situation and achieve good compression ratios by using tables (or whatever the compressor uses to represent it's state) to handle this case.

If you really use all the bits in a byte, the entropy ("randomness") of the input is much higher, so while you have an input which is only 1/8th in size to start with, you also make the compressor's job considerably harder, and it's compression ratio will suffer from this. Anyway, I absolutely think this is the way to go as you don't rely on a compressor which may or may not be good at catching up the "lots of 0s scheme" you have in your input data.

Or any tricks that i can use to lower the data randomness?

These "tricks" involve performing transformations on your input data to reduce the entropy of the input data. What you can do here really depends on the nature of your input data. If it's truly black and white "images", you might want to have a look at JBIG or check out the transformations defined in the PNG image standard.


But when I compress this file my ratio is very poor because the high randomness of the data.

Compression ratio is a red herring here. You should instead be comparing the compressed file sizes.

In theory, there should be no difference in the compressed file sizes, since it's the same data.

Uncompressed, the bits-as-bytes file would be 8 times larger. However, it compresses well--theoretically, to 1/8 its size--but no better than the uncompressed packed-bits version.

(I've assumed you're writing 8-bit bytes here. If you're writing 32-bit integers, substitute 32 for 8 above.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜