Caching huge data in Process memory
I am working in Finance Industry. We want to roll out Database hit for data processing. It is very costly. So we are planning to go for on-demand cache logic. [ runtime insert & runtime lookup ]
Is anyone worked in implementation of Caching logic for more than 10 million of records?. Per record is say about 160 - 200 bytes.
I faced following disadvantages with diff开发者_运维百科erent approach.
- Can not use stl std::map to implement a key base cache registry. The insert and lookup is very slow after 200000 records.
- Shared memory or memory mapped files are kind of overhead for caching data, because these data are not shared across the processes
- Use of sqlite3 in-memory & flatfile application database can be worth. But it too have slow lookup after a 2-3 million of records.
- Process memory might have some limitation on its own kernel memory consumption. my assumption is 2 gig on 32 bit machine & 4 gig on 64 bit machine.
Please suggest me something if you had come across this problem and solved by any means.
Thanks
If your cache is a simple key-value store, you should not be using std::map
, which has O(log n) lookup, but std::unordered_map
, which has O(1) lookup. You should only use std::map
if you require sorting.
It sounds like performance is what you're after, so you might want to look at Boost Intrusive. You can easily combine unordered_map
and list
to create a high-efficiency LRU.
Read everything into memory and create R&B tree for key access.
http://www.mit.edu/~emin/source_code/cpp_trees/index.html
In one recent project, we had database with some 10s M records, and were using such strategy.
Your data weight is 2GB, from your post. With overhead, it will come up to say double. It's no problem for any 64bit architecture.
I have recently changed the memory allocation of our product (3D medical volume viewer) to use good old memory mapped files.
The advantages were:
- I can allocate all physical RAM if I like (my 32 bit app sometimes needs more than 4 gig on a 64 bit machine)
- If you map only portions, your adress space is largely free for your application to use, which improves reliability.
- if you run out of memory, things just slow down, no crashes.
In my case it was just data (mostly readonly). If you have a more complex data structure, this will be more work than using "normal" objects.
You can actually share these across processes (if they're backed by a real file). This may behave differently, I dont have experience with that.
精彩评论