Random-access container that does not fit in memory?

2022-12-17 21:49 问答作者：

I have an array of objects (say, images), which is too large to fit into memory (e.g. 40GB). But my code needs to be able to randomly access these objects at runtime.

What is the best way to do this?

From my code's point of view, it shouldn't matter, of course, if some of the data is on disk or temporarily stored in memory; it should have transparent access:

container.getObject(1242)->process();
container.getObject(479431)->process();

But how should I implement this container? Should it just send the requests to a data开发者_如何学Cbase? If so, which one would be the best option? (If a database, then it should be free and not too much administration hassle, maybe Berkeley DB or sqlite?)

Should I just implement it myself, memoizing objects after acces sand purging the memory when it's full? Or are there good libraries (C++) for this out there?

The requirements for the container would be that it minimizes disk access (some elements might be accessed more frequently by my code, so they should be kept in memory) and allows fast access.

UPDATE: I turns out that STXXL does not work for my problem because the objects I store in the container have dynamic size, i.e. my code may update them (increasing or decreasing the size of some objects) at runtime. But STXXL cannot handle that:

STXXL containers assume that the data types they store are plain old data types (POD). http://algo2.iti.kit.edu/dementiev/stxxl/report/node8.html

Could you please comment on other solutions? What about using a database? And which one?

Consider using the STXXL:

The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

You could look into memory mapped files, and then access one of those too.

I would implement a basic cache. With this workingset size you will have the best results with a set-associative-cache with x byte cache-lines ( x == what best matches your access pattern ). Just implement in software what every modern processor already has in hardware. This should give you imho the best results. You could than optimize it further if you can optimize the accesspattern to be somehow linear.

One solution is to use a structure similar to a B-Tree, indices and "pages" of arrays or vectors. The concept is that the index is used to determine which page to load into memory to access your variable.

If you make the page size smaller, you can store multiple pages in memory. A caching system based on frequency of use or other rule, will reduce the number of page loads.

I've seen some very clever code that overloads operator[]() to perform disk access on the fly and load required data from disk/database transparently.

继续阅读：data-structures database memory random-access

Random-access container that does not fit in memory?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？