开发者

Reducing memory usage of very large HashMap

I have a very large hash map (2+ million entries) that is created by reading in the contents of a CSV file. Some information:

  1. The HashMap maps a String key (which is less than 20 chars) to a String value (which is approximately 50 characters).
  2. This HashMap is initialized with an initial capacity of 3 million so that the load factor is around .66.
  3. The HashMap is only utilized by a single operation, and once that operation is completed, I "clear()" it. (Although it doesn't appear that this clear actually clears up memory, is a separate call to System.gc() necessary?).

One idea I had was to change the HashMap to HashMap and use the hashCode of the String as the key, this will end up saving a bit of memory but risks issues with collisions if two strings have identical hash codes ... how likely i开发者_C百科s this for strings that are less than 20 characters long?

Does anyone else have any ideas on what to do here? The CSV file itself is only 100 MB, but java ends up using over 600MB in memory for this HashMap.

Thanks!


It sounds like you have the framework to try this already. Instead of adding the string, add the string.hashCode() and see if you get collisions.

In terms of freeing up memory, the JVM generally doesn't get smaller, but it will garbage collect if it needs to.

Also, it sounds like you might have an algorithm that doesn't need the hash table at all. Could you describe what you're trying to do in a little more detail?


Parse the CSV, and build a Map whose keys are your existing keys, but values are Integer pointers to locations in the files for that key.

When you want the value for a key, find the index in the map, then use a RandomAccessFile to read that line from the file. Keep the RandomAccessFile open during processing, then close it when done.


what you are trying to do is exactly a JOIN operation. Try considering an in-memory DB like H2 and you can achieve this by loading both CSV files to temp tables and then do a JOIN over them. And as per my experience h2 runs great with load operation and this code will certainly be faster and less memory intensive than ur manual HashMap based joining method.


If performance isn't the primary concern, store the entries in a database instead. Then memory isn't a concern, and you have good, if not great, search speed thanks to the database.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜