开发者

What is the maximum theoretically possible compression rate?

This is a theoretical question, so expect that many details here are not computable in practice or even in theory.

Let's say I have a string s that I want to compress. The result should be a self-extracting binary (can be x86 assembler, but it can also be some other hypothetical Turing-complete low level language) which outputs s.

Now, we can easily iterate through all possible such binaries and programs, ordered by size. Let B_s be the sub-list of these binaries who output s (of course B_s is uncomputable).

As every set of positive integers must have a minimum, there must be a smallest program b_min_s in B_s.

For what languages (i.e. set of strings) do we know something about开发者_如何转开发 the size of b_min_s? Maybe only an estimation. (I can construct some trivial examples where I can always even calculate B_s and also b_min_s, but I am interested in more interesting languages.)


This is Kolmogorov complexity, and you are correct that it's not computable. If it were, you could create a paradoxical program of length n that printed a string with Kolmogorov complexity m > n.

Clearly, you can bound b_min_s for given inputs. However, as far as I know most of the efforts to do so have been existence proofs. For instance, there is an ongoing competition to compress English Wikipedia.


Claude Shannon estimated the information density of the English language to be somewhere between 0.6 and 1.3 bits per character in his 1951 paper Prediction and Entropy of Printed English (PDF, 1.6 MB. Bell Sys. Tech. J (3) p. 50-64).


The maximal (avarage) compression rate possible is 1:1.
The number of possible inputs is equal to the number of outputs.
It has to be to be able to map the output back to the input.
To be able to store the output you need container at the same size as the minimal container for the input - giving 1:1 compression rate.


Basically, you need enough information to rebuild your original information. I guess the other answers are more helpful for your theoretical discussion, but just keep this in mind.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜