开发者

Best compression library/format for compressing on the fly and binary search?

I'm looking for a compression library/format with the following abilities:

  1. Can compress my data as I write it.
  2. Will let me efficiently binary search through the file.
  3. Will let me efficiently traverse the file in reverse.

Context: I'm writing a C++ app that listens for incoming data, normalizes it, and then needs to persist the normalized output to disk. The data already compresses pretty well when I run gzip on the files by hand. However, the amount of incoming data is potentially massive, and I'd like to do the compression on the fly. Each entry in the file has a timestamp associated with it and I may be only interested in the chunk of data between time X and time Y, so to quickly find that chunk I'd like to be able to binary search. And even iterate in reverse i开发者_如何转开发f possible. Do any particular compression libraries/formats stick out as being particularly good for my project? I've found libraries that satisfy #1, but often whether #2 or #3 will work is undocumented.


You can just compress a few chunks at a time so that you can decompress them separately, then keep an (uncompressed but small) index to the beginning of each block of chunks in the compressed data. That will allow almost random access to the chunks and still keep them in order by timestamp. The limit case to this is to compress each chunk individually, although that might hurt your compression ratio.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜