开发者

What are good compression-oriented application programming interfaces (APIs)?

What are good compression-oriented application programming interfaces (APIs)?

Do people still use the 1991 "data compression interface" draft standard, and the 1991 "Stream transformation algorithm interface" draft standard. (Both draft standards by Ross Williams)? Are there any alternatives to those draft standards?

(I'm particularly looking for C APIs, but links to compression-oriented APIs in C++ and other languages would also be appreciated).

I'm experimenting with some data compression algorithms. Typically the compressed file I'm producing is composed of a series of blocks, with a block header indicating which compression algorithm needs to be used to decompress the remaining data in that block开发者_运维问答 -- Huffman, LZW, LZP, "stored uncompressed", etc.

The block header also indicates which filter(s) need to be used to convert the intermediate stream or buffer of data from the decompressor into a lossless copy of the original plaintext -- Burrows–Wheeler transform, delta encoding, XML end-tag restoration, "copy unchanged", etc.

Rather than use a huge switch statement that selects based on the "compression type", which calls the selected decompression algorithm or filter algorithm, each procedure with its own special number and order of parameters, it simplifies my code if every algorithm has exactly the same API -- the same number and order of parameters, etc.

Rather than waiting for the decompressor to run through the entire input stream before handing its output to the first filter, It would be nice if the API supported decompressed output data coming out the final filter "relatively quickly" (low-latency) after relatively little compressed data has been fed into the initial decompressor. It would be nice if the API could be used in systems that have only one thread or process.

Currently I'm kludging together my own internal API, re-using existing compression algorithm implementations by writing short wrapper functions to convert between my internal API and the special number and order of parameters used by each implementation.

Is there an already-existing API that I could use rather than designing my own from scratch? Where can I find such an API?


I fear such an "API" does not exist. Especially, requirement such as "starting stage-2 while stage-1 is ongoing and unfinished" is completely implementation dependant; and cannot be added later by an API layer.

Btw, Maciej Adamczyk just tried the same as you. He made an open source benchmark comparing multiple compression algorithms over a block-compression scenario. The code can be consulted here : http://encode.ru/threads/1371-Filesystem-benchmark?p=26630&viewfull=1#post26630

He has been obliged to "encapsulate" all these different compressor interfaces in order to cope with the difference. Now for the good thing : most compressors tend to have relatively similar C interface when it comes to compressing a block of data. AS an example, they can be as simple as this one : http://code.google.com/p/lz4/source/browse/trunk/lz4.h So, in the end, the adaptation layer is not so heavy.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜