mmap-loadable data structure library for C++ (or C)

2022-12-20 22:48 问答作者：

I have a some large data structure (N > 10,000) that usually only needs to be created once (at runtime), and can be reused many times afterwards, but it needs to be loaded very quickly. (It is used for user input processing on iPhoneOS.) mmap-ing a file seems to be the best choice.

Are there any data structure libraries for C++ (or C)? Something along the line

ReadOnlyHashTable<char, int> table ("filename.hash");
// mmap(...) inside the c'tor
...
int freq开发者_开发知识库 = table.get('a');
...
// munmap(...); inside the d'tor.

Thank you!

Details:

I've written a similar class for hash table myself but I find it pretty hard to maintain, so I would like to see if there's existing solutions already. The library should

Contain a creation routine that serialize the data structure into file. This part doesn't need to be fast.
Contain a loading routine that mmap a file into read-only (or read-write) data structure that can be usable within O(1) steps of processing.
Use O(N) amount of disk/memory space with a small constant factor. (The device has serious memory constraint.)
Small time overhead to accessors. (i.e. the complexity isn't modified.)

Assumptions:

Bit representation of data (e.g. endianness, encoding of float, etc.) does not matter since it is only used locally.
So far the possible types of data I need are integers, strings, and struct's of them. Pointers do not appear.

P.S. Can Boost.intrusive help?

You could try to create a memory mapped file and then create the STL map structure with a customer allocator. Your customer allocator then simply takes the beginning of the memory of the memory mapped file, and then increments its pointer according to the requested size. In the end all the allocated memory should be within the memory of the memory mapped file and should be reloadable later.

You will have to check if memory is free'd by the STL map. If it is, your customer allocator will lose some memory of the memory mapped file but if this is limited you can probably live with it.

Sounds like maybe you could use one of the "perfect hash" utilities out there. These spend some time opimising the hash function for the particular data, so there are no hash collisions and (for minimal perfect hash functions) so that there are no (or at least few) empty gaps in the hash table. Obviously, this is intended to be generated rarely but used frequently.

CMPH claims to cope with large numbers of keys. However, I have never used it.

There's a good chance it only generates the hash function, leaving you to use that to generate the data structure. That shouldn't be especially hard, but it possibly still leaves you where you are now - maintaining at least some of the code yourself.

Just thought of another option - Datadraw. Again, I haven't used this, so no guarantees, but it does claim to be a fast persistent database code generator.

WRT boost.intrusive, I've just been having a look. It's interesting. And annoying, as it makes one of my own libraries look a bit pointless.

I thought this section looked particularly relevant.

If you can use "smart pointers" for links, presumably the smart pointer type can be implemented using a simple offset-from-base-address integer (and I think that's the point of the example). An array subscript might be equally valid.

There's certainly unordered set/multiset support (C++ code for hash tables).

Using cmph would work. It does have the serialization machinery for the hash function itself, but you still need to serialize the keys and the data, besides adding a layer of collision resolution on top of it if your query set universe is not known before hand. If you know all keys before hand, then it is the way to go since you don't need to store the keys and will save a lot of space. If not, for such a small set, I would say it is overkill.

Probably the best option is to use google's sparse_hash_map. It has very low overhead and also has the serialization hooks that you need.

http://google-sparsehash.googlecode.com/svn/trunk/doc/sparse_hash_map.html#io

GVDB (GVariant Database), the core of Dconf is exactly this.

See git.gnome.org/browse/gvdb, dconf and bv
and developer.gnome.org/glib/2.30/glib-GVariant.html

继续阅读：c data-structures serialization

mmap-loadable data structure library for C++ (or C)

Details:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Details:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？