开发者

Caching huge data in Process memory

I am working in Finance Industry. We want to roll out Database hit for data processing. It is very costly. So we are planning to go for on-demand cache logic. [ runtime insert & runtime lookup ]

Is anyone worked in implementation of Caching logic for more than 10 million of records?. Per record is say about 160 - 200 bytes.

I faced following disadvantages with diff开发者_运维百科erent approach.

  1. Can not use stl std::map to implement a key base cache registry. The insert and lookup is very slow after 200000 records.
  2. Shared memory or memory mapped files are kind of overhead for caching data, because these data are not shared across the processes
  3. Use of sqlite3 in-memory & flatfile application database can be worth. But it too have slow lookup after a 2-3 million of records.
  4. Process memory might have some limitation on its own kernel memory consumption. my assumption is 2 gig on 32 bit machine & 4 gig on 64 bit machine.

Please suggest me something if you had come across this problem and solved by any means.

Thanks


If your cache is a simple key-value store, you should not be using std::map, which has O(log n) lookup, but std::unordered_map, which has O(1) lookup. You should only use std::map if you require sorting.

It sounds like performance is what you're after, so you might want to look at Boost Intrusive. You can easily combine unordered_map and list to create a high-efficiency LRU.


Read everything into memory and create R&B tree for key access.

http://www.mit.edu/~emin/source_code/cpp_trees/index.html

In one recent project, we had database with some 10s M records, and were using such strategy.

Your data weight is 2GB, from your post. With overhead, it will come up to say double. It's no problem for any 64bit architecture.


I have recently changed the memory allocation of our product (3D medical volume viewer) to use good old memory mapped files.

The advantages were:

  • I can allocate all physical RAM if I like (my 32 bit app sometimes needs more than 4 gig on a 64 bit machine)
  • If you map only portions, your adress space is largely free for your application to use, which improves reliability.
  • if you run out of memory, things just slow down, no crashes.

In my case it was just data (mostly readonly). If you have a more complex data structure, this will be more work than using "normal" objects.

You can actually share these across processes (if they're backed by a real file). This may behave differently, I dont have experience with that.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜