开发者

ArrayList<ArrayList<String>> runs outofmemory (Java heap space). Any other option?

I am working with ArrayList data structure for dealing with cvs file. My machine is pretty powerful: Memory: 8 GB of Ram Processor: 4 CPUs, each i5 Intel core 2.5GHz

In eclipse, I assigned -Xmx5120m (5GB of RAM for the java vm) using the vm arguments panel in Run as->Configuration.

I am still getting "outofmemory java heap space" for my ArrayList<ArrayList<String>> if it is more than like 468000 X 108. I am using arraylist because I feel myself most comfortable with it and it makes it easy to process the datas for my purpose.

Actually, I am using this 2-dimension array for column-based context, like

arraylist.get(i).g开发者_如何学Goet(0) 

where

0 < i < 468000 

would represent one column. Since I do operations like (replacing a column by an another column, copying a column, inserting a column into arbitrary position in the arrayList etc.. ), I could only think of arrayList because it has amortized constant time for adding or inserting into the arraylist in its average case.

So now my question is:

Which other datastructures could I use instead of arraylist in order to reach a magnitude of much more than 468000 X 108 (for example, like (833 * 1000000) X 108) and be able to do all operations that I mentioned above? (but I still want to be able to do this on my machine using the capacity that I have)

I could think of doing all this stuff sequentially, meaning that processing first 468000 X 108 and write it to a csv file and then again loading into the 468000 X 108 arraylist and writing it to a different file etc...

I don't think that I reached the limit of arraylist for my capacity.

I would appreciate any kind of help.


You are trying to stuff a file with 468,000 lines into 5G of memory, and are running out of memory.

The data structure isn't the problem.

You need to change your approach and not do that. Process chunks of the file at a time, only extract the data you need, etc.


Inserting somewhere within an ArrayList won't give you amortized constant time, as the list will have to be copied internally - this will only work as long as you insert at the end.

Besides, when the ArrayList has to grow, it will calculate the new size by

  int newCapacity = (oldCapacity * 3)/2 + 1;

which could waste huge amounts of memory in your case - it would be more efficient to use custom-sized String-arrays instead of the list (or call at least trimToSize() once you're done reading a column).

As long as you're only needing a few columns per time, I'd suggest to store each column in a separate file, which you can load/write on demand - if they'll only contain strings, you could think of some easy-readable binary format and use DataOutputStream and -InputStream, for instance. Inserting a column would simply become a file renaming operation... You could also add some caching, to keep the most recent or most often used columns in memory (Search for java.util.LinkedHashMap to get an idea of a simple LFU-Cache). Don't use a database if you don't need transactions or such, don't store such data with in a verbose format like XML - you'd get a huge performance loss otherwise.

Finally, I'd think about the content of the matrix, as strings can become pretty huge: Do you really need them as strings, or can you create a less memory consuming representation of them? For instance, if you'd only have 60.000 different strings, you could create a mapping between them and a short, and work with the shorts in memory.


A good way to "change your approach", as others have suggested is to persist your data in a database or xml file, then work with smaller subsets of that data as you need them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜