Compression of numeric strings

2022-12-20 01:44 问答作者：

Can anyone suggest comp开发者_高级运维ression algorithms to operate on numeric strings of 20-30 digits ?

You can easily compress 30 character string down to 15 bytes by just using binary representations of each digit. For example, 1592 can be represented as a series of four-bit values as such:

0001 0101 1001 0010

This, when grouped in groups of two four-bit values, can be represented as §Т in standard ASCII.

Further, if your strings contain many identical consecutive digits, you can implement a variation of Run-Length Encoding.

Assuming you can have floating point numbers, you have a possibility of 11 symbols:

[0,1,2,3,4,5,6,7,8,9, .]

This means that you need 4 bits per symbol. 3 bits can only represent a maximum of 8 symbols. You can easily use 4 bits per each symbol and get a lot of compression.

If you only have integer digits in your string, an easy solution is to convert to hexidecimal and you can use 4 bits per symbol still while getting a better compression ratio. (since there are no wasted bits with 16 symbols)

If you use Huffman compression you will get an optimal bits/per symbol ratio. You can read more about Huffman compression here.

Make it 2 15 digit numbers and convert them to 2 64 bit integers? Or are they floats?

Break it up into a couple of unsigned ints?

"9347692367596047327509604839"

becomes:

9 347692367 596047327 509604839

One obvious solution is to "compress" them as a binary numeric representation rather than a string representation. See this stack overflow question for example libraries.

I would definitely go for the easiest solution, and just store them as integers (of a suitable size, be it 32-bit, 64-bit or 128 bit, depending on needs). Compressing it with an algorithm supporting characters would waste a lot of space, since it would have to cater for a lot more than 10 different values (0-9) per character .

one of the most common ways to compress numbers (assuming you have more than one you want to compress -- its kind of hard to compress one thing), is using delta encoding. It works on the principle that if you know the first number is x, and the numbers after it are relatively similar, you can encode the subsequent numbers as (x+c1), (x+c2), etc.

In this scheme, you only have to encode the full x value once, and if your c values are smaller than your x's, then you can save a lot of space. You can also use a version of this that sorts the numbers first, and then your delta refers to the number last seen instead of one number. With this method you can cover a wider range of numbers more efficiently.

继续阅读：algorithm compression

Compression of numeric strings

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？