Algorithm to store Item-to-Item-Associations

2023-03-24 10:33 问答作者：

I need some help to store some data efficiently. I have a large list of objects (about 100.000) and want to store associations between this items with a coefficient. Not all items are associated, in fact I have something about 1 Mio. Associations. I need fast access to these associations when referencing by the two items. What I did is a structure like that:

Map<Item, Map<Item, Float>>

I tried this with HashMap and Hashtable. Both work fine and is fast enough. My problem is, that all that Maps create a lot of overhead in开发者_JAVA百科 memory, concrete for the given scenario more than 300 MB. Is there a Map-Implementation with less footprint? Is there maybe a better algorithm to store that kind of data?

Here are some ideas:

Store in a Map<Pair<Item,Item>,Float>. If you are worried about allocating a new Pair for each lookup, and your code is synchronized, you can keep a single lookup Pair instance.
Loosen the outer map to be Map<Item, ?>. The value can be a simple {Item,Float} tuple for the first association, a small tuple array for a small number of associations, then promote to a full fledged Map.
Use Commons Collections' Flat3Map for the inner maps.
If you are in tight control of the Items, and Item equivalence is referential (i.e. each Item instance is not equal to any other Item instance, then you can number each instance. Since you are talking about < 2 billion instances, a single Long will represent an Item pair with some bit manipulation. Then the map gets much smaller if you use Trove's TLongObjectHashMap

You have two options.

1) Reduce what you're storing.

If your data is calculable, using a WeakHashMap will allow the garbage collector to remove members. You will probably want to decorate it with a mechanism that calculates lost or absent key/value pairs on the fly. This is basically a cache.

Another possibility that might trim a relatively tiny amount of RAM is to instruct your JVM to use compressed object pointers. That may save you about 3 MB with your current data size.

2) Expand your capacity.

I'm not sure what your constraint is (run-time memory on a desktop, serialization, etc.) but you can either expand the heapsize and deal with it, or you can push it out of process. With all those "NoSQL" stores out there, one will probably fit your needs. Or, an indexed db table can be quite fast. If you're looking for a simple key-value store, Voldemort is extremely easy to set up and integrate.

However, I don't know what you're doing with your working set. Can you give more details? Are you performing aggregations, partitioning, cluster analysis, etc.? Where are you running into trouble?

继续阅读：algorithm associations memory

Algorithm to store Item-to-Item-Associations

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？