Can compression algorithm "learn" on set of files and compress them better?
Is there compression library that support "learning" on some set of files or using some files as base for compressing other files?
This can be useful if we want to compress many similar files retaining fast access to each of them.
Something like:
# compression:
compressor.learn_on_data(standard_data);
compressor.compresss(data, data_compressed);
# decompression:
decompressor.learn_on_data(the_same_standard_data);
decompressor.decompress(data_compressed, data);
How is it called (I think that "delta compression" is a bit other thing)? Are there implement开发者_开发百科ations of this in popular compression libraries? I expect it to work by, for example, pre-filling dictionaries with standard data.
Yes it works. Although there are many techniques for this, the easiest one you'll find is called "dictionary pre-filling". In short, you are providing a file, from which the latest part is "digested" (up to the maximum window size, which can be anywhere from 4K to 64MB depending on your algorithm), and can therefore be used to better compress the next file.
In practice, this is similar to "solid mode", when within an archive all files of identical type are grouped together, so that the previous file can be used as a dictionary for the next one, which improves compression ratio.
Downside : the same dictionary must be provided for both the compressor and decompressor.
精彩评论