开发者

How to store (and read) large arrays/maps/whatever in Java?

Can anyone help me with following problem? I need to permanently save what I today have in arrays, to later use the data fo开发者_开发问答r calculations. I explain an example below.

1, I generate a long[][] which is far too big for my computers RAM. It is generated one row after the other.

2, I calculate something from my long[][] and save the results in a double[][] - also too big for my RAM. I do not need the entire long[][] at the same time, as a small batch of rows are used in the calculations at the same time, and one row in the double[][] is filled for each batch.

3, I need to sort the double[][], and do a lot of other things not important here.

4, I repeat step 2 and 3 in a number of iterations (largish, >10000), which means I care about the speed of both access and sorting.

I know the size of the arrays, but obviously I cannot initialize them as they are too big, and also because it must be initialized by an int (so far, I can only run "small" calculations). Ofcourse, I can use Maps etc, but I have failed to get this working, and I do not understand which kind(s) I should use. I have never used maps/collections etc before. In the latter case I can use one of the columns in the arrays as keys, as they are identical (except from the type). The key could simply be the row number (expressed as a long).

Preferably, I want to solve this without using a database that needs installation of a server, as my program will be used by others than me.

I am more than grateful for any help and advice!


If the arrays are larger than can be stored in your computer's RAM, then, obviously, you should store part of the array or its entirety on disk.

For this purpose, you can use a database. Now that you don't want to install a server, you can use an embedded database such as HSQLDB. You can configure HSQLDB to delete all data when your application terminates or to retain them for future use.

An alternative is to use a custom Map implementation that flushes the data to secondary storage whenever its size increases more than a threshold defined by you. For this purpose, multiple strategies are available: FIFO, LIFO, LRU, etc. Also whenever you need to access a certain element of the map, again you can load a bulk of adjacent elements from the disk (or again, use a strategy that is more appropriate for your use case) to reduce excessive disk I/O.


For storing this data you could use netcdf or hdf5. You can get and save subsets of arrays.


Managing subset of data is likely to be the best solution.

However, You should ask yourself if you are using the right machine for the job. You can buy a new PC, Core 2 Duo 2.5 GHz with 4 Gb of memory for £225. You can buy a Quad core AMD with 8 GB for £380. You can buy 16 GB of memory for £320.

My point being that your time with worth something and you need to trade off how much work it will take you now and in the future to save some memory and how much that memory is worth.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜