Reducing memory usage of very large HashMap

2023-03-21 07:06 问答作者：

I have a very large hash map (2+ million entries) that is created by reading in the contents of a CSV file. Some information:

The HashMap maps a String key (which is less than 20 chars) to a String value (which is approximately 50 characters).
This HashMap is initialized with an initial capacity of 3 million so that the load factor is around .66.
The HashMap is only utilized by a single operation, and once that operation is completed, I "clear()" it. (Although it doesn't appear that this clear actually clears up memory, is a separate call to System.gc() necessary?).

One idea I had was to change the HashMap to HashMap and use the hashCode of the String as the key, this will end up saving a bit of memory but risks issues with collisions if two strings have identical hash codes ... how likely i开发者_C百科s this for strings that are less than 20 characters long?

Does anyone else have any ideas on what to do here? The CSV file itself is only 100 MB, but java ends up using over 600MB in memory for this HashMap.

Thanks!

It sounds like you have the framework to try this already. Instead of adding the string, add the string.hashCode() and see if you get collisions.

In terms of freeing up memory, the JVM generally doesn't get smaller, but it will garbage collect if it needs to.

Also, it sounds like you might have an algorithm that doesn't need the hash table at all. Could you describe what you're trying to do in a little more detail?

Parse the CSV, and build a Map whose keys are your existing keys, but values are Integer pointers to locations in the files for that key.

When you want the value for a key, find the index in the map, then use a RandomAccessFile to read that line from the file. Keep the RandomAccessFile open during processing, then close it when done.

what you are trying to do is exactly a JOIN operation. Try considering an in-memory DB like H2 and you can achieve this by loading both CSV files to temp tables and then do a JOIN over them. And as per my experience h2 runs great with load operation and this code will certainly be faster and less memory intensive than ur manual HashMap based joining method.

If performance isn't the primary concern, store the entries in a database instead. Then memory isn't a concern, and you have good, if not great, search speed thanks to the database.

继续阅读：hashmap memory memory-leaks memory-management

Reducing memory usage of very large HashMap

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？